Topic

Word embedding

About: Word embedding is a research topic. Over the lifetime, 4683 publications have been published within this topic receiving 153378 citations. The topic is also known as: word embeddings.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Integrating extra knowledge into word embedding models for biomedical NLP tasks

[...]

Yuan Ling¹, Yuan An¹, Mengwen Liu¹, Sadid A. Hasan², Yetian Fan³, Xiaohua Hu¹ - Show less +2 more•Institutions (3)

Drexel University¹, Philips², Dalian University of Technology³

14 May 2017

TL;DR: The main idea is to construct a weighted graph from knowledge bases (KBs) to represent structured relationships among words/concepts and propose a GCBOW model and a GSkip-gram model respectively by integrating such a graph into the original CBOW and Skip-gram models via graph regularization.

...read moreread less

Abstract: Word embedding in the NLP area has attracted increasing attention in recent years. The continuous bag-of-words model (CBOW) and the continuous Skip-gram model (Skip-gram) have been developed to learn distributed representations of words from a large amount of unlabeled text data. In this paper, we explore the idea of integrating extra knowledge to the CBOW and Skip-gram models and applying the new models to biomedical NLP tasks. The main idea is to construct a weighted graph from knowledge bases (KBs) to represent structured relationships among words/concepts. In particular, we propose a GCBOW model and a GSkip-gram model respectively by integrating such a graph into the original CBOW model and Skip-gram model via graph regularization. Our experiments on four general domain standard datasets show encouraging improvements with the new models. Further evaluations on two biomedical NLP tasks (biomedical similarity/relatedness task and biomedical Information Retrieval (IR) task) show that our methods have better performance than baselines.

...read moreread less

26 citations

Proceedings Article•DOI•

Exploring word embedding techniques to improve sentiment analysis of software engineering texts

[...]

Eeshita Biswas¹, K. Vijay-Shanker¹, Lori Pollock¹•Institutions (1)

University of Delaware¹

26 May 2019

TL;DR: The impact of two machine learning techniques, oversampling and undersampling of data, on the training of a sentiment classifier for handling small SE datasets with a skewed distribution are investigated.

...read moreread less

Abstract: Sentiment analysis (SA) of text-based software artifacts is increasingly used to extract information for various tasks including providing code suggestions, improving development team productivity, giving recommendations of software packages and libraries, and recommending comments on defects in source code, code quality, possibilities for improvement of applications. Studies of state-of-the-art sentiment analysis tools applied to software-related texts have shown varying results based on the techniques and training approaches. In this paper, we investigate the impact of two potential opportunities to improve the training for sentiment analysis of SE artifacts in the context of the use of neural networks customized using the Stack Overflow data developed by Lin et al. We customize the process of sentiment analysis to the software domain, using software domain-specific word embeddings learned from Stack Overflow (SO) posts, and study the impact of software domain-specific word embeddings on the performance of the sentiment analysis tool, as compared to generic word embeddings learned from Google News. We find that the word embeddings learned from the Google News data performs mostly similar and in some cases better than the word embeddings learned from SO posts. We also study the impact of two machine learning techniques, oversampling and undersampling of data, on the training of a sentiment classifier for handling small SE datasets with a skewed distribution. We find that oversampling alone, as well as the combination of oversampling and undersampling together, helps in improving the performance of a sentiment classifier.

...read moreread less

26 citations

Proceedings Article•DOI•

Hashtag Recommendation Based on Topic Enhanced Embedding, Tweet Entity Data and Learning to Rank

[...]

Quanzhi Li¹, Sameena Shah¹, Armineh Nourbakhsh¹, Xiaomo Liu¹, Rui Fang¹ - Show less +1 more•Institutions (1)

Thomson Reuters¹

24 Oct 2016

TL;DR: A new approach of recommending hashtags for tweets is presented that uses Learning to Rank algorithm to incorporate features built from topic enhanced word embeddings, tweet entity data, hashtag frequency, hashtag temporal data and tweet URL domain information.

...read moreread less

Abstract: In this paper, we present a new approach of recommending hashtags for tweets. It uses Learning to Rank algorithm to incorporate features built from topic enhanced word embeddings, tweet entity data, hashtag frequency, hashtag temporal data and tweet URL domain information. The experiments using millions of tweets and hashtags show that the proposed approach outperforms the three baseline methods -- the LDA topic, the tf.idf based and the general word embedding approaches.

...read moreread less

26 citations

Journal Article•DOI•

Dynamic knowledge graph based fake-review detection

[...]

Youli Fang¹, Hong Wang¹, Zhao Lili¹, Yu Fengping¹, Wang Caiyu¹ - Show less +1 more•Institutions (1)

Shandong Normal University¹

01 Dec 2020-Applied Intelligence

TL;DR: A dynamic knowledge graph-based method for fake-review detection based on the characteristics of online product reviews, which surpassed the state-of-the-art results in experimental evaluations.

...read moreread less

Abstract: Online product reviews are an important driver of customers’ purchasing behavior. Fake reviews seriously mislead consumers, challenging the fairness of the online shopping environment. Although the detection of fake reviews has progressed, several problems remain. First, fake comment recognition ignores the correlation between time and the semantics of the comment texts, which is always hidden in the context of the reviews. Second, the impact of multi-source information on fake comment recognition is not considered, as it constitutes a complex, high-dimensional, heterogeneous relationship between reviewers, reviews, stores and commodities. To overcome these problems, the present paper proposes a dynamic knowledge graph-based method for fake-review detection. Based on the characteristics of online product reviews, it first extracts four types of entities using a developed neural network model called sentence vector/twin-word embedding conditioned bidirectional long short-term memory. Time series related features are then added to the knowledge graph construction process, forming dynamic graph networks. To enhance the fake-review detection, four indicators are newly defined for determining the relationships among the four types of nodes. In experimental evaluations, our method surpassed the state-of-the-art results.

...read moreread less

26 citations

Journal Article•DOI•

Biomedical event trigger detection by dependency-based word embedding

[...]

Jian Wang¹, Jianhai Zhang¹, Yuan An², Hongfei Lin¹, Zhihao Yang¹, Yijia Zhang¹, Yuanyuan Sun¹ - Show less +3 more•Institutions (2)

Dalian University of Technology¹, Drexel University²

10 Aug 2016-BMC Medical Genomics

TL;DR: This paper proposes an approach which utilizes neural network model based on dependency-based word embedding to automatically learn significant features from raw input for trigger classification, and achieves the semantic distributed representation of every trigger word.

...read moreread less

Abstract: In biomedical research, events revealing complex relations between entities play an important role. Biomedical event trigger identification has become a research hotspot since its important role in biomedical event extraction. Traditional machine learning methods, such as support vector machines (SVM) and maxent classifiers, which aim to manually design powerful features fed to the classifiers, depend on the understanding of the specific task and cannot generalize to the new domain or new examples. In this paper, we propose an approach which utilizes neural network model based on dependency-based word embedding to automatically learn significant features from raw input for trigger classification. First, we employ Word2vecf, the modified version of Word2vec, to learn word embedding with rich semantic and functional information based on dependency relation tree. Then neural network architecture is used to learn more significant feature representation based on raw dependency-based word embedding. Meanwhile, we dynamically adjust the embedding while training for adapting to the trigger classification task. Finally, softmax classifier labels the examples by specific trigger class using the features learned by the model. The experimental results show that our approach achieves a micro-averaging F1 score of 78.27 and a macro-averaging F1 score of 76.94 % in significant trigger classes, and performs better than baseline methods. In addition, we can achieve the semantic distributed representation of every trigger word.

...read moreread less

25 citations

Collapse

Network Information

Performance

Metrics

5,718

Papers

201,647

Citations

No. of papers in the topic in previous years
Year	Papers
2023	317
2022	716
2021	736
2020	1,025
2019	1,078
2018	788

Word embedding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics