Topic

Word embedding

About: Word embedding is a research topic. Over the lifetime, 4683 publications have been published within this topic receiving 153378 citations. The topic is also known as: word embeddings.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Extractive document summarization based on convolutional neural networks

[...]

Yong Zhang¹, Meng Joo Er¹, Mahardhika Pratama²•Institutions (2)

Nanyang Technological University¹, La Trobe University²

01 Oct 2016

TL;DR: A document summarization framework based on convolutional neural networks is successfully developed to learn sentence features and perform sentence ranking jointly and adapt the original CNN model to address a regression process for sentence ranking.

...read moreread less

Abstract: Extractive summarization aims to generate a summary by ranking sentences, whose performance relies heavily on the quality of sentence features. In this paper, a document summarization framework based on convolutional neural networks is successfully developed to learn sentence features and perform sentence ranking jointly. We adapt the original CNN model to address a regression process for sentence ranking. Pre-trained word vectors are used to enhance the performance of our model. We evaluate our proposed method on the DUC 2002 and 2004 datasets covering single and multi-document summarization tasks respectively. The proposed system achieves competitive or even better performance compared with state-of-the-art document summarization systems.

...read moreread less

33 citations

Proceedings Article•DOI•

Sentiment Classification: Feature Selection Based Approaches Versus Deep Learning

[...]

Alper Kursat Uysal¹, Yi Lu Murphey•Institutions (1)

Anadolu University¹

01 Aug 2017

TL;DR: An in-depth comparative study on the effectiveness of word embeddings features, feature sets including combination of selected bag-of-words features and averaged word embedding features were used in sentiment classification.

...read moreread less

Abstract: Classification of text documents is commonly carried out using various models of bag-of-words that are generated using feature selection methods. In these models, selected features are used as input to well-known classifiers such as Support Vector Machines (SVM) and neural networks. In recent years, a technique called word embeddings has been developed for text mining and, deep learning models using word embeddings have become popular for sentiment classification. However, there is no extensive study has been conducted to compare these approaches for sentiment classification. In this paper, we present an in-depth comparative study on these two types of approaches, feature selection based approaches and and deep learning models for document-level sentiment classification. Experiments were conducted using four datasets with varying characteristics. In order to investigate the effectiveness of word embeddings features, feature sets including combination of selected bag-of-words features and averaged word embedding features were used in sentiment classification. For analyzing deep learning models, we implemented three different deep learning architecture, convolutional neural network, long short-term memory network, and long-term recurrent convolutional network. Our experimental results show that that deep learning models performed better on three out of the four datasets, a combination of selected bag-of-words features and averaged word embedding features gave the best performance on one dataset. In addition, we will show that a deep learning model initialized with either one-hot vectors or fine-tuned word embeddings performed better than the model initialized using than word embeddings without tuning.

...read moreread less

33 citations

Proceedings Article•DOI•

Performance Comparison of Machine Learning Classifiers for Fake News Detection

[...]

Smitha N¹, Bharath R¹•Institutions (1)

CMR Institute of Technology¹

15 Jul 2020

TL;DR: Different feature engineering methods like count vector, TF-IDF and word embedding are used to generate feature vector to detect fake news from news article with the assistance of Machine learning and Natural language processing.

...read moreread less

Abstract: Information sharing on the web particularly via web-based networking media is increasing. Ability to identify, evaluate and address such information is significantly important. Fake information deliberately created is purposefully or unintentionally engendered over the internet. This is affecting a larger group of society who are blinded by technology. This paper illustrates model and methodology to detect fake news from news article with the assistance of Machine learning and Natural language processing. In this proposed work different feature engineering methods like count vector, TF-IDF and word embedding are used to generate feature vector. Seven different Machine learning Classification algorithms are trained to classify news as fake or real and are compared considering accuracy, F1 Score, recall, precision and best one is selected to build a model to classify news as fake or real.

...read moreread less

33 citations

Journal Article•DOI•

Cross-Domain Sentiment Encoding through Stochastic Word Embedding

[...]

Yanbin Hao¹, Tingting Mu², Richang Hong³, Meng Wang³, Xueliang Liu³, John Y. Goulermas⁴ - Show less +2 more•Institutions (4)

City University of Hong Kong¹, University of Manchester², Hefei University of Technology³, University of Liverpool⁴

01 Oct 2020-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This work proposes to explore the word polarity and occurrence information through a simple mapping and encode such information more accurately whilst managing lower computational costs and takes advantage of the stochastic embedding technique to tackle cross-domain sentiment alignment.

...read moreread less

Abstract: Sentiment analysis is an important topic concerning identification of feelings, attitudes, emotions and opinions from text. To automate such analysis, a large amount of example text needs to be manually annotated for model training. This is laborious and expensive, but the cross-domain technique is a key solution to reducing the cost by reusing annotated reviews across domains. However, its success largely relies on the learning of a robust common representation space across domains. In the recent years, significant effort has been invested to improve the cross-domain representation learning by designing increasingly more complex and elaborate model inputs and architectures. We support that it is not necessary to increase design complexity as this inevitably consumes more time in model training. Instead, we propose to explore the word polarity and occurrence information through a simple mapping and encode such information more accurately whilst managing lower computational costs. The proposed approach is unique and takes advantage of the stochastic embedding technique to tackle cross-domain sentiment alignment. Its effectiveness is benchmarked with over ten data tasks constructed from two review corpora and it is compared against ten classical and state-of-the-art methods.

...read moreread less

32 citations

Posted Content•

Intrinsic Subspace Evaluation of Word Embedding Representations

[...]

Yadollah Yaghoobzadeh¹, Hinrich Schütze¹•Institutions (1)

Ludwig Maximilian University of Munich¹

25 Jun 2016-arXiv: Computation and Language

TL;DR: A new methodology for intrinsic evaluation of word representations is introduced, which identifies four fundamental criteria based on the characteristics of natural language that pose difficulties to NLP systems and develops tests that directly show whether or not representations contain the subspaces necessary to satisfy these criteria.

...read moreread less

Abstract: We introduce a new methodology for intrinsic evaluation of word representations. Specifically, we identify four fundamental criteria based on the characteristics of natural language that pose difficulties to NLP systems; and develop tests that directly show whether or not representations contain the subspaces necessary to satisfy these criteria. Current intrinsic evaluations are mostly based on the overall similarity or full-space similarity of words and thus view vector representations as points. We show the limits of these point-based intrinsic evaluations. We apply our evaluation methodology to the comparison of a count vector model and several neural network models and demonstrate important properties of these models.

...read moreread less

32 citations

Collapse

Network Information

Performance

Metrics

5,718

Papers

201,647

Citations

No. of papers in the topic in previous years
Year	Papers
2023	317
2022	716
2021	736
2020	1,025
2019	1,078
2018	788

Word embedding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics