Representing Contexual Relations with Sanskrit Word Embeddings

doi:10.1007/978-3-319-62407-5_18

Book ChapterDOI

Representing Contexual Relations with Sanskrit Word Embeddings

Ishank Sharma, +3 more

- pp 262-273

Chats0

TLDR

This work presents a simple yet effective approach of representing Sanskrit words in a continuous vector space and uses word embeddings in similarity, compositionality and visualization tasks to test its efficacy.

Abstract:

Language processing of Sanskrit presents various challenges in the field of computational linguistics. Prosodical, orthographic and inflectional complexities encountered in Sanskrit texts makes it difficult to apply linguistic analysis methods relevant for western European languages. The inadequacy of contemporary computational approaches in the analysis of Sanskrit language is vivdly apparent. In this exposition, we focus on the challenge of learning syntactic and semantic similarities in a rich Sanskrit literature. We present a simple yet effective approach of representing Sanskrit words in a continuous vector space. We utilise word embeddings in similarity, compositionality and visualization tasks to test its efficacy. Experiments show that our method produces interpretable vector offsets exhibiting shared relationships.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Word2vec-based latent semantic analysis (W2V-LSA) for topic modeling: A study on blockchain technology trend analysis

Suhyeon Kim, +2 more

- 15 Aug 2020 -

Expert Systems With Applications

TL;DR: A new topic modeling method called W2V-LSA, which is based on Word2vec and Spherical k-means clustering to better capture and represent the context of a corpus, which can be a competitive alternative for better topic modeling to provide direction for future research in technology trend analysis.

...read moreread less

Journal ArticleDOI

Developing bug severity prediction models using word2vec

Rashmi Agrawal, +1 more

TL;DR: Results show that the bigger window size enhances the performance of classifiers; however, the influence of the minimum word count parameter was found to be mixed and depends on the selected data sets, and out of the classifiers used, Random Forest and Xgboost could classify the severity level for classes with few records or a rare occurrence of words specific to each class.

...read moreread less

Book ChapterDOI

Emotion Recognition of Speech in Hindi Using Dimensionality Reduction and Machine Learning Techniques

Akshat Agrawal, +1 more

TL;DR: In this paper, Naive Bayes classification was applied on the speech corpus collected from non-dramatic actor and the results showed that, NBC generates better results than the exiting machine learning (ML) classification techniques such as K-means nearest neighbor algorithm (KNN),Support vector machine (SVM) and decision tree(DT).

...read moreread less

Proceedings ArticleDOI

Filtering and Extended Vocabulary based Translation for Low-resource Language Pair of Sanskrit-Hindi

Piyush Jha, +2 more

TL;DR: In this article , a zero-shot transformer architecture is used to filter the training corpus of a low-resource Sanskrit-Hindi language pair and then applied to a high-resource language pair in order to demonstrate its efficacy.

...read moreread less

Book ChapterDOI

Hausa Character Recognition Using Logistic Regression

Kristine Yohe

TL;DR: In this article , a technique for the recognition of Hausa characters using logistic regression (LR) was developed, and the system's user interface was developed with C-sharp (C#) programming language.

...read moreread less

References

PDF

Open Access

More filters

Journal Article

Visualizing Data using t-SNE

Laurens van der Maaten, +1 more

- 01 Jan 2008 -

Journal of Machine Learning Research

TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.

...read moreread less

Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

...read moreread less

Journal ArticleDOI

Finding Structure in Time

Jeffrey L. Elman

- 01 Mar 1990 -

Cognitive Science

TL;DR: A proposal along these lines first described by Jordan (1986) which involves the use of recurrent links in order to provide networks with a dynamic memory and suggests a method for representing lexical categories and the type/token distinction is developed.

...read moreread less

Journal ArticleDOI

A neural probabilistic language model

Yoshua Bengio, +3 more

- 01 Mar 2003 -

Journal of Machine Learning Research

TL;DR: The authors propose to learn a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences, which can be expressed in terms of these representations.

...read moreread less

Journal ArticleDOI

Data clustering: 50 years beyond K-means

Anil K. Jain

TL;DR: A brief overview of clustering is provided, well known clustering methods are summarized, the major challenges and key issues in designing clustering algorithms are discussed, and some of the emerging and useful research directions are pointed out.

...read moreread less

Collapse

Representing Contexual Relations with Sanskrit Word Embeddings

Citations

Word2vec-based latent semantic analysis (W2V-LSA) for topic modeling: A study on blockchain technology trend analysis

Developing bug severity prediction models using word2vec

Emotion Recognition of Speech in Hindi Using Dimensionality Reduction and Machine Learning Techniques

Filtering and Extended Vocabulary based Translation for Low-resource Language Pair of Sanskrit-Hindi

Hausa Character Recognition Using Logistic Regression

References

Visualizing Data using t-SNE

Distributed Representations of Words and Phrases and their Compositionality

Finding Structure in Time

A neural probabilistic language model

Data clustering: 50 years beyond K-means

Related Papers (5)

Parsing Sanskrit sentences using Lexical Functional Grammar

Evaluation of Greek Word Embeddings

Linguistic features in Turkish word representations

A framework for analyzing semantic change of words across time

Lexical knowledge acquisition from bilingual corpora