scispace - formally typeset
Search or ask a question
Topic

Word embedding

About: Word embedding is a research topic. Over the lifetime, 4683 publications have been published within this topic receiving 153378 citations. The topic is also known as: word embeddings.


Papers
More filters
Proceedings ArticleDOI
01 Oct 2016
TL;DR: A document summarization framework based on convolutional neural networks is successfully developed to learn sentence features and perform sentence ranking jointly and adapt the original CNN model to address a regression process for sentence ranking.
Abstract: Extractive summarization aims to generate a summary by ranking sentences, whose performance relies heavily on the quality of sentence features. In this paper, a document summarization framework based on convolutional neural networks is successfully developed to learn sentence features and perform sentence ranking jointly. We adapt the original CNN model to address a regression process for sentence ranking. Pre-trained word vectors are used to enhance the performance of our model. We evaluate our proposed method on the DUC 2002 and 2004 datasets covering single and multi-document summarization tasks respectively. The proposed system achieves competitive or even better performance compared with state-of-the-art document summarization systems.

33 citations

Proceedings ArticleDOI
01 Aug 2017
TL;DR: An in-depth comparative study on the effectiveness of word embeddings features, feature sets including combination of selected bag-of-words features and averaged word embedding features were used in sentiment classification.
Abstract: Classification of text documents is commonly carried out using various models of bag-of-words that are generated using feature selection methods. In these models, selected features are used as input to well-known classifiers such as Support Vector Machines (SVM) and neural networks. In recent years, a technique called word embeddings has been developed for text mining and, deep learning models using word embeddings have become popular for sentiment classification. However, there is no extensive study has been conducted to compare these approaches for sentiment classification. In this paper, we present an in-depth comparative study on these two types of approaches, feature selection based approaches and and deep learning models for document-level sentiment classification. Experiments were conducted using four datasets with varying characteristics. In order to investigate the effectiveness of word embeddings features, feature sets including combination of selected bag-of-words features and averaged word embedding features were used in sentiment classification. For analyzing deep learning models, we implemented three different deep learning architecture, convolutional neural network, long short-term memory network, and long-term recurrent convolutional network. Our experimental results show that that deep learning models performed better on three out of the four datasets, a combination of selected bag-of-words features and averaged word embedding features gave the best performance on one dataset. In addition, we will show that a deep learning model initialized with either one-hot vectors or fine-tuned word embeddings performed better than the model initialized using than word embeddings without tuning.

33 citations

Proceedings ArticleDOI
15 Jul 2020
TL;DR: Different feature engineering methods like count vector, TF-IDF and word embedding are used to generate feature vector to detect fake news from news article with the assistance of Machine learning and Natural language processing.
Abstract: Information sharing on the web particularly via web-based networking media is increasing. Ability to identify, evaluate and address such information is significantly important. Fake information deliberately created is purposefully or unintentionally engendered over the internet. This is affecting a larger group of society who are blinded by technology. This paper illustrates model and methodology to detect fake news from news article with the assistance of Machine learning and Natural language processing. In this proposed work different feature engineering methods like count vector, TF-IDF and word embedding are used to generate feature vector. Seven different Machine learning Classification algorithms are trained to classify news as fake or real and are compared considering accuracy, F1 Score, recall, precision and best one is selected to build a model to classify news as fake or real.

33 citations

Journal ArticleDOI
TL;DR: This work proposes to explore the word polarity and occurrence information through a simple mapping and encode such information more accurately whilst managing lower computational costs and takes advantage of the stochastic embedding technique to tackle cross-domain sentiment alignment.
Abstract: Sentiment analysis is an important topic concerning identification of feelings, attitudes, emotions and opinions from text. To automate such analysis, a large amount of example text needs to be manually annotated for model training. This is laborious and expensive, but the cross-domain technique is a key solution to reducing the cost by reusing annotated reviews across domains. However, its success largely relies on the learning of a robust common representation space across domains. In the recent years, significant effort has been invested to improve the cross-domain representation learning by designing increasingly more complex and elaborate model inputs and architectures. We support that it is not necessary to increase design complexity as this inevitably consumes more time in model training. Instead, we propose to explore the word polarity and occurrence information through a simple mapping and encode such information more accurately whilst managing lower computational costs. The proposed approach is unique and takes advantage of the stochastic embedding technique to tackle cross-domain sentiment alignment. Its effectiveness is benchmarked with over ten data tasks constructed from two review corpora and it is compared against ten classical and state-of-the-art methods.

32 citations

Posted Content
TL;DR: A new methodology for intrinsic evaluation of word representations is introduced, which identifies four fundamental criteria based on the characteristics of natural language that pose difficulties to NLP systems and develops tests that directly show whether or not representations contain the subspaces necessary to satisfy these criteria.
Abstract: We introduce a new methodology for intrinsic evaluation of word representations. Specifically, we identify four fundamental criteria based on the characteristics of natural language that pose difficulties to NLP systems; and develop tests that directly show whether or not representations contain the subspaces necessary to satisfy these criteria. Current intrinsic evaluations are mostly based on the overall similarity or full-space similarity of words and thus view vector representations as points. We show the limits of these point-based intrinsic evaluations. We apply our evaluation methodology to the comparison of a count vector model and several neural network models and demonstrate important properties of these models.

32 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
87% related
Unsupervised learning
22.7K papers, 1M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Reinforcement learning
46K papers, 1M citations
84% related
Graph (abstract data type)
69.9K papers, 1.2M citations
84% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023317
2022716
2021736
20201,025
20191,078
2018788