scispace - formally typeset
Search or ask a question
Topic

Word embedding

About: Word embedding is a research topic. Over the lifetime, 4683 publications have been published within this topic receiving 153378 citations. The topic is also known as: word embeddings.


Papers
More filters
Journal ArticleDOI
TL;DR: This work proposes an ontology and latent Dirichlet allocation (OLDA)-based topic modeling and word embedding approach for sentiment classification, which achieves accuracy of 93%, which shows that the proposed approach is effective for sentiment Classification.
Abstract: Social networks play a key role in providing a new approach to collecting information regarding mobility and transportation services. To study this information, sentiment analysis can make decent observations to support intelligent transportation systems (ITSs) in examining traffic control and management systems. However, sentiment analysis faces technical challenges: extracting meaningful information from social network platforms, and the transformation of extracted data into valuable information. In addition, accurate topic modeling and document representation are other challenging tasks in sentiment analysis. We propose an ontology and latent Dirichlet allocation (OLDA)-based topic modeling and word embedding approach for sentiment classification. The proposed system retrieves transportation content from social networks, removes irrelevant content to extract meaningful information, and generates topics and features from extracted data using OLDA. It also represents documents using word embedding techniques, and then employs lexicon-based approaches to enhance the accuracy of the word embedding model. The proposed ontology and the intelligent model are developed using Web Ontology Language and Java, respectively. Machine learning classifiers are used to evaluate the proposed word embedding system. The method achieves accuracy of 93%, which shows that the proposed approach is effective for sentiment classification.

113 citations

Journal ArticleDOI
TL;DR: Results show that applying word embedding with ensemble and SMOTE can achieve more than 15% improvement on average in F 1 score over the baseline, which is a weighted average of precision and recall and is considered a better performance measure than accuracy for imbalanced datasets.

112 citations

Journal ArticleDOI
TL;DR: The paper describes the usage of self-learning Hierarchical LSTM technique for classifying hatred and trolling contents in social media code-mixed data and the method developed based on HLSTM model helps in recognizing the hatred word context by mining the intention of the user for using that word in the sentence.
Abstract: The paper describes the usage of self-learning Hierarchical LSTM technique for classifying hatred and trolling contents in social media code-mixed data. The Hierarchical LSTM-based learning is a novel learning architecture inspired from the neural learning models. The proposed HLSTM model is trained to identify the hatred and trolling words available in social media contents. The proposed HLSTM systems model is equipped with self-learning and predicting mechanism for annotating hatred words in transliteration domain. The Hindi–English data are ordered into Hindi, English, and hatred labels for classification. The mechanism of word embedding and character-embedding features are used here for word representation in the sentence to detect hatred words. The method developed based on HLSTM model helps in recognizing the hatred word context by mining the intention of the user for using that word in the sentence. Wide experiments suggests that the HLSTM-based classification model gives the accuracy of 97.49% when evaluated against the standard parameters like BLSTM, CRF, LR, SVM, Random Forest and Decision Tree models especially when there are some hatred and trolling words in the social media data.

111 citations

Journal ArticleDOI
TL;DR: A novel hybrid deep learning model is proposed that strategically combines different word embedding (Word2Vec, FastText, character-level embedding) with different deep learning methods (LSTM, GRU, BiL STM, CNN) and classifies texts in terms of sentiment.
Abstract: A massive use of social media platforms such as Twitter and Facebook by omnifarious organizations has increased the critical individual feedback on the situation, events, products, and services. However, sentiment classification plays an important role in the user's feedback evaluation. At present, deep learning such as long short-term memory (LSTM), gated recurrent unit (GRU), bidirectionally long short-term memory (BiLSTM) or convolutional neural network (CNN) are prevalently preferred in sentiment classification. Moreover, word embedding such as Word2Vec and FastText is closely examined in text for mapping closely related to the vectors of real numbers. However, both deep learning and word embedding methods have strengths and weaknesses. Combining the strengths of the deep learning models with that of word embedding is the key to high-performance sentiment classification in the field of natural language processing (NLP). In the present study, we propose a novel hybrid deep learning model that strategically combines different word embedding (Word2Vec, FastText, character-level embedding) with different deep learning methods (LSTM, GRU, BiLSTM, CNN). The proposed model extracts features of different deep learning methods of word embedding, combines these features and classifies texts in terms of sentiment. To verify the performance of the proposed model, several deep learning models called basic models were created to perform series of experiments. By comparing, the performance of the proposed model with that of past studies, the proposed model offers better sentiment classification performance.

111 citations

Journal ArticleDOI
TL;DR: The findings suggest that relatively smaller domain-specific input corpora from the Twitter corpus are better in extracting meaningful semantic relationship than generic pre-trained Word2Vec or GloVe, and the accuracy of word vectors for identifying crisis-related actionable tweets is explored.
Abstract: Unstructured tweet feeds are becoming the source of real-time information for various events. However, extracting actionable information in real-time from this unstructured text data is a challenging task. Hence, researchers are employing word embedding approach to classify unstructured text data. We set our study in the contexts of the 2014 Ebola and 2016 Zika outbreaks and probed the accuracy of domain-specific word vectors for identifying crisis-related actionable tweets. Our findings suggest that relatively smaller domain-specific input corpora from the Twitter corpus are better in extracting meaningful semantic relationship than generic pre-trained Word2Vec (contrived from Google News) or GloVe (of Stanford NLP group). However, domain-specific quality tweet corpora during the early stages of outbreaks are normally scant, and identifying actionable tweets during early stages is crucial to stemming the proliferation of an outbreak. To overcome this challenge, we consider scholarly abstracts, related to Ebola and Zika virus, from PubMed and probe the efficiency of cross-domain resource utilization for word vector generation. Our findings demonstrate that the relevance of PubMed abstracts for the training purpose when Twitter data (as input corpus) would be scant during the early stages of the outbreak. Thus, this approach can be implemented to handle future outbreaks in real time. We also explore the accuracy of our word vectors for various model architectures and hyper-parameter settings. We observe that Skip-gram accuracies are better than CBOW, and higher dimensions yield better accuracy.

111 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
87% related
Unsupervised learning
22.7K papers, 1M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Reinforcement learning
46K papers, 1M citations
84% related
Graph (abstract data type)
69.9K papers, 1.2M citations
84% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023317
2022716
2021736
20201,025
20191,078
2018788