scispace - formally typeset
Search or ask a question
Topic

Word embedding

About: Word embedding is a research topic. Over the lifetime, 4683 publications have been published within this topic receiving 153378 citations. The topic is also known as: word embeddings.


Papers
More filters
Posted Content
TL;DR: The Word Embedding Association Test is extended to measure bias in sentence encoders and mixed results including suspicious patterns of sensitivity that suggest the test’s assumptions may not hold in general.
Abstract: The Word Embedding Association Test shows that GloVe and word2vec word embeddings exhibit human-like implicit biases based on gender, race, and other social constructs (Caliskan et al., 2017). Meanwhile, research on learning reusable text representations has begun to explore sentence-level texts, with some sentence encoders seeing enthusiastic adoption. Accordingly, we extend the Word Embedding Association Test to measure bias in sentence encoders. We then test several sentence encoders, including state-of-the-art methods such as ELMo and BERT, for the social biases studied in prior work and two important biases that are difficult or impossible to test at the word level. We observe mixed results including suspicious patterns of sensitivity that suggest the test's assumptions may not hold in general. We conclude by proposing directions for future work on measuring bias in sentence encoders.

277 citations

Journal ArticleDOI
TL;DR: This paper reviews the latest studies that have employed deep learning to solve sentiment analysis problems, such as sentiment polarity, and models using term frequency-inverse document frequency and word embedding have been applied to a series of datasets.
Abstract: The study of public opinion can provide us with valuable information. The analysis of sentiment on social networks, such as Twitter or Facebook, has become a powerful means of learning about the users’ opinions and has a wide range of applications. However, the efficiency and accuracy of sentiment analysis is being hindered by the challenges encountered in natural language processing (NLP). In recent years, it has been demonstrated that deep learning models are a promising solution to the challenges of NLP. This paper reviews the latest studies that have employed deep learning to solve sentiment analysis problems, such as sentiment polarity. Models using term frequency-inverse document frequency (TF-IDF) and word embedding have been applied to a series of datasets. Finally, a comparative study has been conducted on the experimental results obtained for the different models and input features.

273 citations

Proceedings ArticleDOI
25 Jul 2017
TL;DR: This paper proposes a framework called Defect Prediction via Convolutional Neural Network (DP-CNN), which leverages deep learning for effective feature generation and evaluates the method on seven open source projects in terms of F-measure in defect prediction.
Abstract: To improve software reliability, software defect prediction is utilized to assist developers in finding potential bugs and allocating their testing efforts. Traditional defect prediction studies mainly focus on designing hand-crafted features, which are input into machine learning classifiers to identify defective code. However, these hand-crafted features often fail to capture the semantic and structural information of programs. Such information is important in modeling program functionality and can lead to more accurate defect prediction.In this paper, we propose a framework called Defect Prediction via Convolutional Neural Network (DP-CNN), which leverages deep learning for effective feature generation. Specifically, based on the programs' Abstract Syntax Trees (ASTs), we first extract token vectors, which are then encoded as numerical vectors via mapping and word embedding. We feed the numerical vectors into Convolutional Neural Network to automatically learn semantic and structural features of programs. After that, we combine the learned features with traditional hand-crafted features, for accurate software defect prediction. We evaluate our method on seven open source projects in terms of F-measure in defect prediction. The experimental results show that in average, DP-CNN improves the state-of-the-art method by 12%.

272 citations

Proceedings ArticleDOI
01 Jun 2019
TL;DR: This work proposes a method in which it dynamically aggregate contextualized embeddings of each unique string that the authors encounter and uses a pooling operation to distill a ”global” word representation from all contextualized instances.
Abstract: Contextual string embeddings are a recent type of contextualized word embedding that were shown to yield state-of-the-art results when utilized in a range of sequence labeling tasks. They are based on character-level language models which treat text as distributions over characters and are capable of generating embeddings for any string of characters within any textual context. However, such purely character-based approaches struggle to produce meaningful embeddings if a rare string is used in a underspecified context. To address this drawback, we propose a method in which we dynamically aggregate contextualized embeddings of each unique string that we encounter. We then use a pooling operation to distill a ”global” word representation from all contextualized instances. We evaluate these ”pooled contextualized embeddings” on common named entity recognition (NER) tasks such as CoNLL-03 and WNUT and show that our approach significantly improves the state-of-the-art for NER. We make all code and pre-trained models available to the research community for use and reproduction.

269 citations

Journal ArticleDOI
TL;DR: A unified framework to expand short texts based on word embedding clustering and convolutional neural network and semantic cliques via fast clustering is proposed, which validates the effectiveness of the proposed method on two open benchmarks.

268 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
87% related
Unsupervised learning
22.7K papers, 1M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Reinforcement learning
46K papers, 1M citations
84% related
Graph (abstract data type)
69.9K papers, 1.2M citations
84% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023317
2022716
2021736
20201,025
20191,078
2018788