scispace - formally typeset
Search or ask a question
Topic

Word embedding

About: Word embedding is a research topic. Over the lifetime, 4683 publications have been published within this topic receiving 153378 citations. The topic is also known as: word embeddings.


Papers
More filters
Proceedings Article
03 May 2017
TL;DR: In this article, the authors presented ten word embedding data sets from Twitter and showed how to use the data sets in some NLP tasks, such as tweet sentiment analysis and tweet topic classification.
Abstract: A word embedding is a low-dimensional, dense and real-valued vector representation of a word. Word embeddings have been used in many NLP tasks. They are usually generated from a large text corpus. The embedding of a word captures both its syntactic and semantic aspects. Tweets are short, noisy and have unique lexical and semantic features that are different from other types of text. Therefore, it is necessary to have word embeddings learned specifically from tweets. In this paper, we present ten word embedding data sets. In addition to the data sets learned from just tweet data, we also built embedding sets from the general data and the combination of tweets and the general data. The general data consist of news articles, Wikipedia data and other web data. These ten embedding models were learned from about 400 million tweets and 7 billion words from the general data. In this paper, we also present two experiments demonstrating how to use the data sets in some NLP tasks, such as tweet sentiment analysis and tweet topic classification tasks.

35 citations

Journal ArticleDOI
TL;DR: In this article, a secondary analysis of Medical Information Mart for Intensive Care III (MIMIC-III) data was conducted to predict the risk of mortality of critically ill patients.
Abstract: Diabetes mellitus is a prevalent metabolic disease characterized by chronic hyperglycemia. The avalanche of healthcare data is accelerating precision and personalized medicine. Artificial intelligence and algorithm-based approaches are becoming more and more vital to support clinical decision-making. These methods are able to augment health care providers by taking away some of their routine work and enabling them to focus on critical issues. However, few studies have used predictive modeling to uncover associations between comorbidities in ICU patients and diabetes. This study aimed to use Unified Medical Language System (UMLS) resources, involving machine learning and natural language processing (NLP) approaches to predict the risk of mortality. We conducted a secondary analysis of Medical Information Mart for Intensive Care III (MIMIC-III) data. Different machine learning modeling and NLP approaches were applied. Domain knowledge in health care is built on the dictionaries created by experts who defined the clinical terminologies such as medications or clinical symptoms. This knowledge is valuable to identify information from text notes that assert a certain disease. Knowledge-guided models can automatically extract knowledge from clinical notes or biomedical literature that contains conceptual entities and relationships among these various concepts. Mortality classification was based on the combination of knowledge-guided features and rules. UMLS entity embedding and convolutional neural network (CNN) with word embeddings were applied. Concept Unique Identifiers (CUIs) with entity embeddings were utilized to build clinical text representations. The best configuration of the employed machine learning models yielded a competitive AUC of 0.97. Machine learning models along with NLP of clinical notes are promising to assist health care providers to predict the risk of mortality of critically ill patients. UMLS resources and clinical notes are powerful and important tools to predict mortality in diabetic patients in the critical care setting. The knowledge-guided CNN model is effective (AUC = 0.97) for learning hidden features.

35 citations

Proceedings ArticleDOI
01 Nov 2018
TL;DR: The authors' models generalize well over all aspects and achieve state-of-the-art performance on 4 out of 7 aspects compared to the baseline framework.
Abstract: Sentiment analysis can categorize an overall opinion from a sentence or a document. However, there are sentences with more than one opinion in a single sentence statement. This problem is solved by aspect-based sentiment analysis. We conduct experiments on this problem using Indonesian dataset with 2-step process: aspect detection and sentiment classification. On aspect detection, we compare two deep neural network models with different input vector and topology: word embedding vector which is processed using gated recurrent unit (GRU), and bag-of-words vector which is processed using fully-connected layer. On sentiment classification, we also compare two approaches of deep neural network. The first approach uses word embedding, sentiment lexicon and POS tags as the input vector, with bi-GRU based as the topology. The second one uses aspect matrix to rescale the word embedding vector as the input vector and convolutional neural network (CNN)based as the topology. Our work is compared to a baseline framework which uses different model for each aspect. The dataset has approximately 9800 reviews collected from various categories on popular online marketplaces in Indonesia. Our models generalize well over all aspects and achieve state-of-the-art performance on 4 out of 7 aspects compared to the baseline framework.

35 citations

Journal ArticleDOI
TL;DR: A deep neural network approach to extract e-cigarette safety information in social media and can be generalized to extract medical concepts from social media for other medical applications.

35 citations

Journal ArticleDOI
TL;DR: An integrated architecture of Convolutional Neural Network and Long Short-Term Memory network is proposed to identify the polarity of words on the Google cloud and performing computations on Google Colaboratory to provide an appropriate solution for analyzing sentiments and classification of the opinions into positive and negative classes.
Abstract: The rapid development of social media, and special websites with critical reviews of products have created a huge collection of resources for customers all over the world. These data may contain a lot of information including product reviews, predicting market changes, and the polarity of opinions. Machine learning and deep learning algorithms provide the necessary tools for intelligence analysis in these challenges. In current competitive markets, it is essential to understand opinions, and sentiments of reviewers by extracting and analyzing their features. Besides, processing and analyzing this volume of data in the cloud can increase the cost of the system, strongly. Fewer dependencies on expensive hardware, storage space, and related software can be provided through cloud computing and Natural Language Processing (NLP). In our work, we propose an integrated architecture of Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) network to identify the polarity of words on the Google cloud and performing computations on Google Colaboratory. Our proposed model based on deep learning algorithms with word embedding technique learns features through a CNN layer, and these features are fed directly into a bidirectional LSTM layer to capture long-term feature dependencies. Then, they can be reused from a CNN layer to provide abstract features before final dense layers. The main goal for this work is to provide an appropriate solution for analyzing sentiments and classification of the opinions into positive and negative classes. Our implementations show that found on the proposed model, the accuracy of more than 89.02% is achievable.

35 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
87% related
Unsupervised learning
22.7K papers, 1M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Reinforcement learning
46K papers, 1M citations
84% related
Graph (abstract data type)
69.9K papers, 1.2M citations
84% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023317
2022716
2021736
20201,025
20191,078
2018788