Topic

Word embedding

About: Word embedding is a research topic. Over the lifetime, 4683 publications have been published within this topic receiving 153378 citations. The topic is also known as: word embeddings.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Learning Stock Market Sentiment Lexicon and Sentiment-Oriented Word Vector from StockTwits

[...]

Quanzhi Li¹, Sameena Shah¹•Institutions (1)

Thomson Reuters¹

01 Aug 2017

TL;DR: This paper presents a new approach to learning stock market lexicon from StockTwits, a popular financial social network for investors to share ideas that learns word polarity by predicting message sentiment, using a neural net-work.

...read moreread less

Abstract: Previous studies have shown that investor sentiment indicators can predict stock market change. A domain-specific sentiment lexicon and sentiment-oriented word embedding model would help the sentiment analysis in financial domain and stock market. In this paper, we present a new approach to learning stock market lexicon from StockTwits, a popular financial social network for investors to share ideas. It learns word polarity by predicting message sentiment, using a neural net-work. The sentiment-oriented word embeddings are learned from tens of millions of StockTwits posts, and this is the first study presenting sentiment-oriented word embeddings for stock market. The experiments of predicting investor sentiment show that our lexicon outperformed other lexicons built by the state-of-the-art methods, and the sentiment-oriented word vector was much better than the general word embeddings.

...read moreread less

49 citations

Posted Content•

Self-Knowledge Distillation in Natural Language Processing

[...]

Sangchul Hahn¹, Heeyoul Choi¹•Institutions (1)

Handong Global University¹

02 Aug 2019-arXiv: Computation and Language

TL;DR: This paper proposed a self-knowledge distillation method based on the soft target probabilities of the training model itself, where multimode information is distilled from the word embedding space right below the softmax layer.

...read moreread less

Abstract: Since deep learning became a key player in natural language processing (NLP), many deep learning models have been showing remarkable performances in a variety of NLP tasks, and in some cases, they are even outperforming humans. Such high performance can be explained by efficient knowledge representation of deep learning models. While many methods have been proposed to learn more efficient representation, knowledge distillation from pretrained deep networks suggest that we can use more information from the soft target probability to train other neural networks. In this paper, we propose a new knowledge distillation method self-knowledge distillation, based on the soft target probabilities of the training model itself, where multimode information is distilled from the word embedding space right below the softmax layer. Due to the time complexity, our method approximates the soft target probabilities. In experiments, we applied the proposed method to two different and fundamental NLP tasks: language model and neural machine translation. The experiment results show that our proposed method improves performance on the tasks.

...read moreread less

49 citations

Book Chapter•DOI•

A Review on Word Embedding Techniques for Text Classification

[...]

S. Selva Birunda¹, R. Kanniga Devi¹•Institutions (1)

Kalasalingam University¹

01 Jan 2021

TL;DR: This paper provided an overview of the different types of word embedding techniques and discussed the open issues and future research scope for the improvement of word representation, which can be used to increase the model accuracy and excels in sentiment classification, text classification, next sentence prediction, and other Natural Language Processing tasks.

...read moreread less

Abstract: Word embeddings are fundamentally a form of word representation that links the human understanding of knowledge meaningfully to the understanding of a machine. The representations can be a set of real numbers (a vector). Word embeddings are scattered depiction of a text in an n-dimensional space, which tries to capture the word meanings. This paper aims to provide an overview of the different types of word embedding techniques. It is found from the review that there exist three dominant word embeddings namely, Traditional word embedding, Static word embedding, and Contextualized word embedding. BERT is a bidirectional transformer-based Contextualized word embedding which is more efficient as it can be pre-trained and fine-tuned. As a future scope, this word embedding along with the neural network models can be used to increase the model accuracy and it excels in sentiment classification, text classification, next sentence prediction, and other Natural Language Processing tasks. Some of the open issues are also discussed and future research scope for the improvement of word representation.

...read moreread less

49 citations

Proceedings Article•DOI•

Is it time to switch to Word Embedding and Recurrent Neural Networks for Spoken Language Understanding

[...]

Vedran Vukotić, Christian Raymond, Guillaume Gravier

06 Sep 2015

TL;DR: It is shown that, despite efficient word representations used within Recurrent Neural Networks, their ability to process sequences is still significantly lower than for CRF, while also having a drawback of higher computational costs, and that the ability of CRF to model output label dependencies is crucial for SLU.

...read moreread less

Abstract: Recently, word embedding representations have been investigated for slot filling in Spoken Language Understanding, along with the use of Neural Networks as classifiers Neural Networks , especially Recurrent Neural Networks, that are specifically adapted to sequence labeling problems, have been applied successfully on the popular ATIS database In this work, we make a comparison of this kind of models with the previously state-of-the-art Conditional Random Fields (CRF) classifier on a more challenging SLU database We show that, despite efficient word representations used within these Neural Networks, their ability to process sequences is still significantly lower than for CRF, while also having a drawback of higher computational costs, and that the ability of CRF to model output label dependencies is crucial for SLU

...read moreread less

49 citations

Journal Article•DOI•

Named Entity Recognition with Word Embeddings and Wikipedia Categories for a Low-Resource Language

[...]

Arjun Das¹, Debasis Ganguly², Utpal Garain³•Institutions (3)

University of Calcutta¹, Dublin City University², Indian Statistical Institute³

20 Jan 2017

TL;DR: A word embedding--based named entity recognition (NER) approach that significantly outperforms standard baseline CRF approaches that use cluster labels of word embeddings and gazetteers constructed from Wikipedia and an unsupervised approach that uses an automatically created named entity (NE) gazetteser from Wikipedia in the absence of training data.

...read moreread less

Abstract: In this article, we propose a word embedding--based named entity recognition (NER) approach. NER is commonly approached as a sequence labeling task with the application of methods such as conditional random field (CRF). However, for low-resource languages without the presence of sufficiently large training data, methods such as CRF do not perform well. In our work, we make use of the proximity of the vector embeddings of words to approach the NER problem. The hypothesis is that word vectors belonging to the same name category, such as a person’s name, occur in close vicinity in the abstract vector space of the embedded words. Assuming that this clustering hypothesis is true, we apply a standard classification approach on the vectors of words to learn a decision boundary between the NER classes. Our NER experiments are conducted on a morphologically rich and low-resource language, namely Bengali. Our approach significantly outperforms standard baseline CRF approaches that use cluster labels of word embeddings and gazetteers constructed from Wikipedia. Further, we propose an unsupervised approach (that uses an automatically created named entity (NE) gazetteer from Wikipedia in the absence of training data). For a low-resource language, the word vectors obtained from Wikipedia are not sufficient to train a classifier. As a result, we propose to make use of the distance measure between the vector embeddings of words to expand the set of Wikipedia training examples with additional NEs extracted from a monolingual corpus that yield significant improvement in the unsupervised NER performance. In fact, our expansion method performs better than the traditional CRF-based (supervised) approach (i.e., F-score of 65.4% vs. 64.2%). Finally, we compare our proposed approach to the official submission for the IJCNLP-2008 Bengali NER shared task and achieve an overall improvement of F-score 11.26% with respect to the best official system.

...read moreread less

49 citations

Collapse

Network Information

Performance

Metrics

5,718

Papers

201,647

Citations

No. of papers in the topic in previous years
Year	Papers
2023	317
2022	716
2021	736
2020	1,025
2019	1,078
2018	788

Word embedding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics