Topic

Word embedding

About: Word embedding is a research topic. Over the lifetime, 4683 publications have been published within this topic receiving 153378 citations. The topic is also known as: word embeddings.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Transportation sentiment analysis using word embedding and ontology-based topic modeling

[...]

Farman Ali¹, Daehan Kwak², Pervez Khan³, Shaker El-Sappagh⁴, Shaker El-Sappagh¹, Amjad Ali⁵, Amjad Ali¹, Sana Ullah⁶, Sana Ullah⁷, Kyehyun Kim¹, Kyung Sup Kwak¹ - Show less +7 more•Institutions (7)

Inha University¹, Kean University², Information Technology University³, Banha University⁴, COMSATS Institute of Information Technology⁵, Gyeongsang National University⁶, University of Swat⁷

15 Jun 2019-Knowledge Based Systems

TL;DR: This work proposes an ontology and latent Dirichlet allocation (OLDA)-based topic modeling and word embedding approach for sentiment classification, which achieves accuracy of 93%, which shows that the proposed approach is effective for sentiment Classification.

...read moreread less

Abstract: Social networks play a key role in providing a new approach to collecting information regarding mobility and transportation services. To study this information, sentiment analysis can make decent observations to support intelligent transportation systems (ITSs) in examining traffic control and management systems. However, sentiment analysis faces technical challenges: extracting meaningful information from social network platforms, and the transformation of extracted data into valuable information. In addition, accurate topic modeling and document representation are other challenging tasks in sentiment analysis. We propose an ontology and latent Dirichlet allocation (OLDA)-based topic modeling and word embedding approach for sentiment classification. The proposed system retrieves transportation content from social networks, removes irrelevant content to extract meaningful information, and generates topics and features from extracted data using OLDA. It also represents documents using word embedding techniques, and then employs lexicon-based approaches to enhance the accuracy of the word embedding model. The proposed ontology and the intelligent model are developed using Web Ontology Language and Java, respectively. Machine learning classifiers are used to evaluate the proposed word embedding system. The method achieves accuracy of 93%, which shows that the proposed approach is effective for sentiment classification.

...read moreread less

113 citations

Journal Article•DOI•

Using Word Embedding and Ensemble Learning for Highly Imbalanced Data Sentiment Analysis in Short Arabic Text

[...]

Sadam Al-Azani¹, El-Sayed M. El-Alfy¹•Institutions (1)

King Fahd University of Petroleum and Minerals¹

01 Jan 2017-Procedia Computer Science

TL;DR: Results show that applying word embedding with ensemble and SMOTE can achieve more than 15% improvement on average in F 1 score over the baseline, which is a weighted average of precision and recall and is considered a better performance measure than accuracy for imbalanced datasets.

...read moreread less

112 citations

Journal Article•DOI•

Hatred and trolling detection transliteration framework using hierarchical LSTM in code-mixed social media text

[...]

Shashi Shekhar¹, Hitendra Garg¹, Rohit Agrawal¹, Shivendra Shivani², Bhisham Sharma³ - Show less +1 more•Institutions (3)

GLA University¹, Thapar University², Chitkara University³

17 Aug 2021-Complex & Intelligent Systems

TL;DR: The paper describes the usage of self-learning Hierarchical LSTM technique for classifying hatred and trolling contents in social media code-mixed data and the method developed based on HLSTM model helps in recognizing the hatred word context by mining the intention of the user for using that word in the sentence.

...read moreread less

Abstract: The paper describes the usage of self-learning Hierarchical LSTM technique for classifying hatred and trolling contents in social media code-mixed data. The Hierarchical LSTM-based learning is a novel learning architecture inspired from the neural learning models. The proposed HLSTM model is trained to identify the hatred and trolling words available in social media contents. The proposed HLSTM systems model is equipped with self-learning and predicting mechanism for annotating hatred words in transliteration domain. The Hindi–English data are ordered into Hindi, English, and hatred labels for classification. The mechanism of word embedding and character-embedding features are used here for word representation in the sentence to detect hatred words. The method developed based on HLSTM model helps in recognizing the hatred word context by mining the intention of the user for using that word in the sentence. Wide experiments suggests that the HLSTM-based classification model gives the accuracy of 97.49% when evaluated against the standard parameters like BLSTM, CRF, LR, SVM, Random Forest and Decision Tree models especially when there are some hatred and trolling words in the social media data.

...read moreread less

111 citations

Journal Article•DOI•

A Novel Hybrid Deep Learning Model for Sentiment Classification

[...]

Mehmet Umut Salur¹, Ilhan Aydin²•Institutions (2)

Harran University¹, Fırat University²

23 Mar 2020-IEEE Access

TL;DR: A novel hybrid deep learning model is proposed that strategically combines different word embedding (Word2Vec, FastText, character-level embedding) with different deep learning methods (LSTM, GRU, BiL STM, CNN) and classifies texts in terms of sentiment.

...read moreread less

Abstract: A massive use of social media platforms such as Twitter and Facebook by omnifarious organizations has increased the critical individual feedback on the situation, events, products, and services. However, sentiment classification plays an important role in the user's feedback evaluation. At present, deep learning such as long short-term memory (LSTM), gated recurrent unit (GRU), bidirectionally long short-term memory (BiLSTM) or convolutional neural network (CNN) are prevalently preferred in sentiment classification. Moreover, word embedding such as Word2Vec and FastText is closely examined in text for mapping closely related to the vectors of real numbers. However, both deep learning and word embedding methods have strengths and weaknesses. Combining the strengths of the deep learning models with that of word embedding is the key to high-performance sentiment classification in the field of natural language processing (NLP). In the present study, we propose a novel hybrid deep learning model that strategically combines different word embedding (Word2Vec, FastText, character-level embedding) with different deep learning methods (LSTM, GRU, BiLSTM, CNN). The proposed model extracts features of different deep learning methods of word embedding, combines these features and classifies texts in terms of sentiment. To verify the performance of the proposed model, several deep learning models called basic models were created to perform series of experiments. By comparing, the performance of the proposed model with that of past studies, the proposed model offers better sentiment classification performance.

...read moreread less

111 citations

Journal Article•DOI•

A tale of two epidemics: Contextual Word2Vec for classifying twitter streams during outbreaks

[...]

Aparup Khatua¹, Aparup Khatua², Apalak Khatua³, Erik Cambria²•Institutions (3)

University of Calcutta¹, Nanyang Technological University², XLRI- Xavier School of Management³

01 Jan 2019-Information Processing and Management

TL;DR: The findings suggest that relatively smaller domain-specific input corpora from the Twitter corpus are better in extracting meaningful semantic relationship than generic pre-trained Word2Vec or GloVe, and the accuracy of word vectors for identifying crisis-related actionable tweets is explored.

...read moreread less

Abstract: Unstructured tweet feeds are becoming the source of real-time information for various events. However, extracting actionable information in real-time from this unstructured text data is a challenging task. Hence, researchers are employing word embedding approach to classify unstructured text data. We set our study in the contexts of the 2014 Ebola and 2016 Zika outbreaks and probed the accuracy of domain-specific word vectors for identifying crisis-related actionable tweets. Our findings suggest that relatively smaller domain-specific input corpora from the Twitter corpus are better in extracting meaningful semantic relationship than generic pre-trained Word2Vec (contrived from Google News) or GloVe (of Stanford NLP group). However, domain-specific quality tweet corpora during the early stages of outbreaks are normally scant, and identifying actionable tweets during early stages is crucial to stemming the proliferation of an outbreak. To overcome this challenge, we consider scholarly abstracts, related to Ebola and Zika virus, from PubMed and probe the efficiency of cross-domain resource utilization for word vector generation. Our findings demonstrate that the relevance of PubMed abstracts for the training purpose when Twitter data (as input corpus) would be scant during the early stages of the outbreak. Thus, this approach can be implemented to handle future outbreaks in real time. We also explore the accuracy of our word vectors for various model architectures and hyper-parameter settings. We observe that Skip-gram accuracies are better than CBOW, and higher dimensions yield better accuracy.

...read moreread less

111 citations

Collapse

Network Information

Performance

Metrics

5,718

Papers

201,647

Citations

No. of papers in the topic in previous years
Year	Papers
2023	317
2022	716
2021	736
2020	1,025
2019	1,078
2018	788

Word embedding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics