scispace - formally typeset
Search or ask a question
Topic

Word embedding

About: Word embedding is a research topic. Over the lifetime, 4683 publications have been published within this topic receiving 153378 citations. The topic is also known as: word embeddings.


Papers
More filters
Journal ArticleDOI
TL;DR: A comparative analysis of multiple machine learning and deep learning models to identify suicidal thoughts from the social media platform Twitter reveals that the RF model can achieve the highest classification score among machine learning algorithms, but training the deep learning classifiers with word embedding increases the performance of ML models.
Abstract: Social networks are essential resources to obtain information about people’s opinions and feelings towards various issues as they share their views with their friends and family. Suicidal ideation detection via online social network analysis has emerged as an essential research topic with significant difficulties in the fields of NLP and psychology in recent years. With the proper exploitation of the information in social media, the complicated early symptoms of suicidal ideations can be discovered and hence, it can save many lives. This study offers a comparative analysis of multiple machine learning and deep learning models to identify suicidal thoughts from the social media platform Twitter. The principal purpose of our research is to achieve better model performance than prior research works to recognize early indications with high accuracy and avoid suicide attempts. We applied text pre-processing and feature extraction approaches such as CountVectorizer and word embedding, and trained several machine learning and deep learning models for such a goal. Experiments were conducted on a dataset of 49,178 instances retrieved from live tweets by 18 suicidal and non-suicidal keywords using Python Tweepy API. Our experimental findings reveal that the RF model can achieve the highest classification score among machine learning algorithms, with an accuracy of 93% and an F1 score of 0.92. However, training the deep learning classifiers with word embedding increases the performance of ML models, where the BiLSTM model reaches an accuracy of 93.6% and a 0.93 F1 score.

19 citations

Journal ArticleDOI
Meng Wang1
03 Dec 2020-PLOS ONE
TL;DR: The results have enriched and developed the theory of tourism service supply chain, providing a reference for constructing a personalized tourism service system.
Abstract: Recently, more personalized travel methods have emerged in the tourism industry, such as individual travel and self-guided travel. The service models of traditional tourism limit the diversity of service options and cannot fully meet the individual needs of tourists anymore. The aim is to integrate sparse tourism information on the Internet, thereby providing more convenient, faster, and more personalized tourism services. Based on the shortcomings of the traditional tourism recommendation system, a deep learning-based classification processing method of tourism product information is proposed. This method uses word embedding in the data preprocessing stage. The Convolutional Neural Network (CNN) is used to process review information of users and tourism service items. The Deep Neural Network (DNN) is used to process the necessary information of users and tourism service items. Also, factorization machine technology is used to learn the interaction between the extracted features to improve the prediction model. The results show that the proposed model can maintain an excellent precision of 64.2% when generating personalized recommendation lists for users. The sensitivity and accuracy of the recommendation list are better than other algorithms. By adding DNN, the word embedding method, and the factorization machine model, the precision is improved by 30%, 33.3%, and 40%, respectively. The model accuracy is the highest with 40 hidden factors, 100 convolutions, and a 100+50 combination hidden layer. Compared with traditional methods, the proposed algorithm can provide users with personalized travel products more accurately in personalized travel recommendations. The results have enriched and developed the theory of tourism service supply chain, providing a reference for constructing a personalized tourism service system.

19 citations

Proceedings ArticleDOI
01 Nov 2020
TL;DR: Two novel numeral embedding methods that can handle the out-of-vocabulary (OOV) problem for numerals are proposed and shown its effectiveness on four intrinsic and extrinsic tasks: word similarity, embedding numeracy, numeral prediction, and sequence labeling.
Abstract: Word embedding is an essential building block for deep learning methods for natural language processing. Although word embedding has been extensively studied over the years, the problem of how to effectively embed numerals, a special subset of words, is still underexplored. Existing word embedding methods do not learn numeral embeddings well because there are an infinite number of numerals and their individual appearances in training corpora are highly scarce. In this paper, we propose two novel numeral embedding methods that can handle the out-of-vocabulary (OOV) problem for numerals. We first induce a finite set of prototype numerals using either a self-organizing map or a Gaussian mixture model. We then represent the embedding of a numeral as a weighted average of the prototype number embeddings. Numeral embeddings represented in this manner can be plugged into existing word embedding learning approaches such as skip-gram for training. We evaluated our methods and showed its effectiveness on four intrinsic and extrinsic tasks: word similarity, embedding numeracy, numeral prediction, and sequence labeling.

19 citations

Book ChapterDOI
24 Sep 2018
TL;DR: Results suggest that word embedding models slightly outperform the alternatives under consideration, with the advantage of not requiring any language-specific lexical resources.
Abstract: This work concerns a study in the Natural Language Processing field aiming to recognise personality traits in Portuguese written text. To this end, we first built a corpus of Facebook status updates labelled with the personality traits of their authors, from which we trained a number of computational models of personality recognition. The models include a range of alternatives ranging from a standard approach relying on lexical knowledge from the LIWC dictionary and others, to purely text-based methods such as bag of words, word embeddings and others. Results suggest that word embedding models slightly outperform the alternatives under consideration, with the advantage of not requiring any language-specific lexical resources.

19 citations

Posted Content
TL;DR: Wang et al. as mentioned in this paper investigated a convolutional attention network called CAN for Chinese NER, which consists of a character-based CNN with local attention layer and a gated recurrent unit (GRU) with global self attention layer to capture the information from adjacent characters and sentence contexts.
Abstract: Named entity recognition (NER) in Chinese is essential but difficult because of the lack of natural delimiters. Therefore, Chinese Word Segmentation (CWS) is usually considered as the first step for Chinese NER. However, models based on word-level embeddings and lexicon features often suffer from segmentation errors and out-of-vocabulary (OOV) words. In this paper, we investigate a Convolutional Attention Network called CAN for Chinese NER, which consists of a character-based convolutional neural network (CNN) with local-attention layer and a gated recurrent unit (GRU) with global self-attention layer to capture the information from adjacent characters and sentence contexts. Also, compared to other models, not depending on any external resources like lexicons and employing small size of char embeddings make our model more practical. Extensive experimental results show that our approach outperforms state-of-the-art methods without word embedding and external lexicon resources on different domain datasets including Weibo, MSRA and Chinese Resume NER dataset.

19 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
87% related
Unsupervised learning
22.7K papers, 1M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Reinforcement learning
46K papers, 1M citations
84% related
Graph (abstract data type)
69.9K papers, 1.2M citations
84% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023317
2022716
2021736
20201,025
20191,078
2018788