Topic
Word embedding
About: Word embedding is a research topic. Over the lifetime, 4683 publications have been published within this topic receiving 153378 citations. The topic is also known as: word embeddings.
Papers
More filters
••
01 Apr 2017
TL;DR: It is obtained an interesting observation that compact word embedding features as determined by PSO are more effective compared to the entireword embedding feature set for entity extraction.
Abstract: Text mining has drawn significant attention in recent past due to the rapid growth in biomedical and clinical records. Entity extraction is one of the fundamental components for biomedical text mining. In this paper, we propose a novel approach of feature selection for entity extraction that exploits the concept of deep learning and Particle Swarm Optimization (PSO). The system utilizes word embedding features along with several other features extracted by studying the properties of the datasets. We obtain an interesting observation that compact word embedding features as determined by PSO are more effective compared to the entire word embedding feature set for entity extraction. The proposed system is evaluated on three benchmark biomedical datasets such as GENIA, GENETAG, and AiMed. The effectiveness of the proposed approach is evident with significant performance gains over the baseline models as well as the other existing systems. We observe improvements of 7.86%, 5.27% and 7.25% F-measure points over the baseline models for GENIA, GENETAG, and AiMed dataset respectively.
21 citations
••
01 Jun 2015TL;DR: This paper reports submissions to semantic textual similarity task, i.e., task 2 in Semantic Evaluation 2015, using various traditional features, as well as novel similarity measures based on distributed word representations, which were trained using deep learning paradigms.
Abstract: This paper reports our submissions to semantic textual similarity task, i.e., task 2 in Semantic Evaluation 2015. We built our systems using various traditional features, such as string-based, corpus-based and syntactic similarity metrics, as well as novel similarity measures based on distributed word representations, which were trained using deep learning paradigms. Since the training and test datasets consist of instances collected from various domains, three different strategies of the usage of training datasets were explored: (1) use all available training datasets and build a unified supervised model for all test datasets; (2) select the most similar training dataset and separately construct a individual model for each test set; (3) adopt multi-task learning framework to make full use of available training sets. Results on the test datasets show that using all datasets as training set achieves the best averaged performance and our best system ranks 15 out of 73.
21 citations
••
29 Jul 2016TL;DR: Bidirectional Long Short-Term Memory with word embedding for Chinese sentiment analysis can learn past and future information and capture stronger dependency relationship and achieves 91.46 % accuracy for sentiment analysis task.
Abstract: Long Short-Term Memory network have been successfully applied to sequence modeling task and obtained great achievements However, Chinese text contains richer syntactic and semantic information and has strong intrinsic dependency between words and phrases In this paper, we propose Bidirectional Long Short-Term Memory (BLSTM) with word embedding for Chinese sentiment analysis BLSTM can learn past and future information and capture stronger dependency relationship Word embedding mainly extract words’ feature from raw characters input and carry important syntactic and semantic information Experimental results show that our model achieves 9146 % accuracy for sentiment analysis task
21 citations
••
01 Nov 2017TL;DR: In this article, the authors proposed a new methodology for hashtag recommendation for microblog posts, specifically Twitter, based on a training-testing framework that builds on the top of the concept of word embedding.
Abstract: The hashtag recommendation problem addresses recommending (suggesting) one or more hashtags to explicitly tag a post made on a given social network platform, based upon the content and context of the post. In this work, we propose a novel methodology for hashtag recommendation for microblog posts, specifically Twitter. The methodology, EmTaggeR, is built upon a training-testing framework that builds on the top of the concept of word embedding. The training phase comprises of learning word vectors associated with each hashtag, and deriving a word embedding for each hashtag. We provide two training procedures, one in which each hashtag is trained with a separate word embedding model applicable in the context of that hashtag, and another in which each hashtag obtains its embedding from a global context. The testing phase constitutes computing the average word embedding of the test post, and finding the similarity of this embedding with the known embeddings of the hashtags. The tweets that contain the most-similar hashtag are extracted, and all the hashtags that appear in these tweets are ranked in terms of embedding similarity scores. The top-K hashtags that appear in this ranked list, are recommended for the given test post. Our system produces F1 score of 50.83%, improving over the LDA baseline by around 6.53 times, outperforming the best-performing system known in the literature that provides a lift of 6.42 times. EmTaggeR is a fast, scalable and lightweight system, which makes it practical to deploy in real-life applications.
21 citations
••
14 Apr 2020TL;DR: This paper proposes a framework for the identification of gender bias in training data for machine learning and draws upon gender theory and sociolinguistics to systematically indicate levels of bias in textual training data and associated neural word embedding models, thus highlighting pathways for both removing bias fromTraining data and critically assessing its impact.
Abstract: Algorithmic bias has the capacity to amplify and perpetuate societal bias, and presents profound ethical implications for society. Gender bias in algorithms has been identified in the context of employment advertising and recruitment tools, due to their reliance on underlying language processing and recommendation algorithms. Attempts to address such issues have involved testing learned associations, integrating concepts of fairness to machine learning, and performing more rigorous analysis of training data. Mitigating bias when algorithms are trained on textual data is particularly challenging given the complex way gender ideology is embedded in language. This paper proposes a framework for the identification of gender bias in training data for machine learning. The work draws upon gender theory and sociolinguistics to systematically indicate levels of bias in textual training data and associated neural word embedding models, thus highlighting pathways for both removing bias from training data and critically assessing its impact in the context of search and recommender systems.
21 citations