Topic

Word embedding

About: Word embedding is a research topic. Over the lifetime, 4683 publications have been published within this topic receiving 153378 citations. The topic is also known as: word embeddings.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

On the various semantics of similarity in word embedding models

[...]

Abel Elekes¹, Martin Schaeler¹, Klemens Boehm¹•Institutions (1)

Karlsruhe Institute of Technology¹

19 Jun 2017

TL;DR: This paper examines when exactly similarity values in word embedding models are meaningful, and proposes a method stating which similarity values actually are meaningful for a given embedding model.

...read moreread less

Abstract: Finding similar words with the help of word embedding models, such as Google's Word2Vec or Glove, computed on large-scale digital libraries has yielded meaningful results in many cases. However, the underlying notion of similarity has remained ambiguous. In this paper, we examine when exactly similarity values in word embedding models are meaningful. To do so, we analyze the statistical distribution of similarity values systematically, conducting two series of experiments. The first one examines how the distribution of similarity values depends on the different embedding-model algorithms and parameters. The second one starts by showing that intuitive similarity thresholds do not exist. We then propose a method stating which similarity values actually are meaningful for a given embedding model. In more abstract terms, our insights give way to a better understanding of the notion of similarity in embedding models and to more reliable evaluations of such models.

...read moreread less

22 citations

Proceedings Article•DOI•

Import2vec learning embeddings for software libraries

[...]

Bart Theeten¹, Frederik Vandeputte¹, Tom Van Cutsem¹•Institutions (1)

Bell Labs¹

26 May 2019

TL;DR: The authors apply word embedding techniques from natural language processing (NLP) to train embeddings for library packages ("library vectors"), which represent libraries by similar context of use as determined by import statements present in source code.

...read moreread less

Abstract: We consider the problem of developing suitable learning representations (embeddings) for library packages that capture semantic similarity among libraries. Such representations are known to improve the performance of downstream learning tasks (e.g. classification) or applications such as contextual search and analogical reasoning. We apply word embedding techniques from natural language processing (NLP) to train embeddings for library packages ("library vectors"). Library vectors represent libraries by similar context of use as determined by import statements present in source code. Experimental results obtained from training such embeddings on three large open source software corpora reveals that library vectors capture semantically meaningful relationships among software libraries, such as the relationship between frameworks and their plug-ins and libraries commonly used together within ecosystems such as big data infrastructure projects (in Java), front-end and back-end web development frameworks (in JavaScript) and data science toolkits (in Python).

...read moreread less

22 citations

Book Chapter•DOI•

Predicting Information Diffusion on Twitter a Deep Learning Neural Network Model Using Custom Weighted Word Features

[...]

Amit Kumar Kushwaha¹, Arpan Kumar Kar¹, P. Vigneswara Ilavarasan¹•Institutions (1)

Indian Institute of Technology Delhi¹

06 Apr 2020

TL;DR: Research experimentations reveal that using the proposed framework of Custom Weighted Word Embedding (CWWE) from the tweet there is a significant improvement in the overall accuracy of Deep Learning framework model in predicting information diffusion through tweets.

...read moreread less

Abstract: Researchers have been experimenting with various drivers of the diffusion rate like sentiment analysis which only considers the presence of certain words in a tweet. We theorize that the diffusion of particular content on Twitter can be driven by a sequence of nouns, adjectives, adverbs forming a sentence. We exhibit that the proposed approach is coherent with the intrinsic disposition of tweets to a common choice of words while constructing a sentence to express an opinion or sentiment. Through this paper, we propose a Custom Weighted Word Embedding (CWWE) to study the degree of diffusion of content (retweet on Twitter). Our framework first extracts the words, create a matrix of these words using the sequences in the tweet text. To this sequence matrix we further multiply custom weights basis the presence index in a sentence wherein higher weights are given if the impactful class of tokens/words like nouns, adjectives are used at the beginning of the sentence than at last. We then try to predict the possibility of diffusion of information using Long-Short Term Memory Deep Neural Network architecture, which in turn is further optimized on the accuracy and training execution time by a Convolutional Neural Network architecture. The results of the proposed CWWE are compared to a pre-trained glove word embedding. For experimentation, we created a corpus of size 230,000 tweets posted by more than 45,000 users in 6 months. Research experimentations reveal that using the proposed framework of Custom Weighted Word Embedding (CWWE) from the tweet there is a significant improvement in the overall accuracy of Deep Learning framework model in predicting information diffusion through tweets.

...read moreread less

22 citations

Journal Article•DOI•

Multi-entity sentiment analysis using self-attention based hierarchical dilated convolutional neural network

[...]

Chenquan Gan¹, Chenquan Gan², Lu Wang¹, Lu Wang², Zufan Zhang², Zufan Zhang¹ - Show less +2 more•Institutions (2)

Chongqing University of Posts and Telecommunications¹, Chinese Ministry of Education²

01 Nov 2020-Future Generation Computer Systems

TL;DR: A self-attention based hierarchical dilated convolutional neural network for multi-entity sentiment analysis (MESA), in which the task is directly transformed into a sequence labeling problem avoiding decomposition and is also suitable for parallel computing.

...read moreread less

22 citations

Journal Article•DOI•

Deep Learning Model for Fine-Grained Aspect-Based Opinion Mining

[...]

Ahmed R. Abas¹, Ibrahim El-Henawy¹, Hossam Mohamed¹, Amr Abdellatif¹•Institutions (1)

Zagazig University¹

13 Jul 2020-IEEE Access

TL;DR: A novel deep learning model for fine-grained aspect-based opinion mining, named as FGAOM is introduced and Multi-head Self-Attention (MSHA) is proposed to effectively fuse internal semantic text representation and take advantage of convolutional layers to model aspect term interaction with surrounding sentiment features.

...read moreread less

Abstract: Despite the great manufactures’ efforts to achieve customer satisfaction and improve their performance, social media opinion mining is still on the fly a big challenge. Current opinion mining requires sophisticated feature engineering and syntactic word embedding without considering semantic interaction between aspect term and opinionated features, which degrade the performance of most of opinion mining tasks, especially those that are designed for smart manufacturing. Research on intelligent aspect level opinion mining (AOM) follows the fast proliferation of user-generated data through social media for industrial manufacturing purposes. Google’s pre-trained language model, Bidirectional Encoder Representations from Transformers (BERT) widely overcomes existing methods in eleven natural language processing (NLP) tasks, which makes it the standard way for semantic text representation. In this paper, we introduce a novel deep learning model for fine-grained aspect-based opinion mining, named as FGAOM. First, we train the BERT model on three specific domain corpora for domain adaption, then use adjusted BERT as embedding layer for concurrent extraction of local and global context features. Then, we propose Multi-head Self-Attention (MSHA) to effectively fuse internal semantic text representation and take advantage of convolutional layers to model aspect term interaction with surrounding sentiment features. Finally, the performance of the proposed model is evaluated via extensive experiments on three public datasets. Results show that performance of the proposed model outperforms performances of recent the-of-the-art models.

...read moreread less

22 citations

Collapse

Network Information

Performance

Metrics

5,718

Papers

201,647

Citations

No. of papers in the topic in previous years
Year	Papers
2023	317
2022	716
2021	736
2020	1,025
2019	1,078
2018	788

Word embedding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics