scispace - formally typeset
Book ChapterDOI

Representing Contexual Relations with Sanskrit Word Embeddings

Reads0
Chats0
TLDR
This work presents a simple yet effective approach of representing Sanskrit words in a continuous vector space and uses word embeddings in similarity, compositionality and visualization tasks to test its efficacy.
Abstract
Language processing of Sanskrit presents various challenges in the field of computational linguistics. Prosodical, orthographic and inflectional complexities encountered in Sanskrit texts makes it difficult to apply linguistic analysis methods relevant for western European languages. The inadequacy of contemporary computational approaches in the analysis of Sanskrit language is vivdly apparent. In this exposition, we focus on the challenge of learning syntactic and semantic similarities in a rich Sanskrit literature. We present a simple yet effective approach of representing Sanskrit words in a continuous vector space. We utilise word embeddings in similarity, compositionality and visualization tasks to test its efficacy. Experiments show that our method produces interpretable vector offsets exhibiting shared relationships.

read more

Citations
More filters
Journal ArticleDOI

Word2vec-based latent semantic analysis (W2V-LSA) for topic modeling: A study on blockchain technology trend analysis

TL;DR: A new topic modeling method called W2V-LSA, which is based on Word2vec and Spherical k-means clustering to better capture and represent the context of a corpus, which can be a competitive alternative for better topic modeling to provide direction for future research in technology trend analysis.
Journal ArticleDOI

Developing bug severity prediction models using word2vec

TL;DR: Results show that the bigger window size enhances the performance of classifiers; however, the influence of the minimum word count parameter was found to be mixed and depends on the selected data sets, and out of the classifiers used, Random Forest and Xgboost could classify the severity level for classes with few records or a rare occurrence of words specific to each class.
Book ChapterDOI

Emotion Recognition of Speech in Hindi Using Dimensionality Reduction and Machine Learning Techniques

TL;DR: In this paper, Naive Bayes classification was applied on the speech corpus collected from non-dramatic actor and the results showed that, NBC generates better results than the exiting machine learning (ML) classification techniques such as K-means nearest neighbor algorithm (KNN),Support vector machine (SVM) and decision tree(DT).
Proceedings ArticleDOI

Filtering and Extended Vocabulary based Translation for Low-resource Language Pair of Sanskrit-Hindi

TL;DR: In this article , a zero-shot transformer architecture is used to filter the training corpus of a low-resource Sanskrit-Hindi language pair and then applied to a high-resource language pair in order to demonstrate its efficacy.
Book ChapterDOI

Hausa Character Recognition Using Logistic Regression

TL;DR: In this article , a technique for the recognition of Hausa characters using logistic regression (LR) was developed, and the system's user interface was developed with C-sharp (C#) programming language.
References
More filters
Journal Article

Visualizing Data using t-SNE

TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Journal ArticleDOI

Finding Structure in Time

TL;DR: A proposal along these lines first described by Jordan (1986) which involves the use of recurrent links in order to provide networks with a dynamic memory and suggests a method for representing lexical categories and the type/token distinction is developed.
Journal ArticleDOI

A neural probabilistic language model

TL;DR: The authors propose to learn a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences, which can be expressed in terms of these representations.
Journal ArticleDOI

Data clustering: 50 years beyond K-means

TL;DR: A brief overview of clustering is provided, well known clustering methods are summarized, the major challenges and key issues in designing clustering algorithms are discussed, and some of the emerging and useful research directions are pointed out.
Related Papers (5)