Topic

Word embedding

About: Word embedding is a research topic. Over the lifetime, 4683 publications have been published within this topic receiving 153378 citations. The topic is also known as: word embeddings.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Entity Extraction in Biomedical Corpora: An Approach to Evaluate Word Embedding Features with PSO based Feature Selection

[...]

Shweta Yadav¹, Asif Ekbal², Sriparna Saha², Pushpak Bhattacharyya³•Institutions (3)

Indian Institute of Technology Patna¹, Indian Institutes of Technology², Indian Institute of Technology Bombay³

01 Apr 2017

TL;DR: It is obtained an interesting observation that compact word embedding features as determined by PSO are more effective compared to the entireword embedding feature set for entity extraction.

...read moreread less

Abstract: Text mining has drawn significant attention in recent past due to the rapid growth in biomedical and clinical records. Entity extraction is one of the fundamental components for biomedical text mining. In this paper, we propose a novel approach of feature selection for entity extraction that exploits the concept of deep learning and Particle Swarm Optimization (PSO). The system utilizes word embedding features along with several other features extracted by studying the properties of the datasets. We obtain an interesting observation that compact word embedding features as determined by PSO are more effective compared to the entire word embedding feature set for entity extraction. The proposed system is evaluated on three benchmark biomedical datasets such as GENIA, GENETAG, and AiMed. The effectiveness of the proposed approach is evident with significant performance gains over the baseline models as well as the other existing systems. We observe improvements of 7.86%, 5.27% and 7.25% F-measure points over the baseline models for GENIA, GENETAG, and AiMed dataset respectively.

...read moreread less

21 citations

Proceedings Article•DOI•

ECNU: Using Traditional Similarity Measurements and Word Embedding for Semantic Textual Similarity Estimation

[...]

Jiang Zhao, Man Lan¹, Junfeng Tian¹•Institutions (1)

East China Normal University¹

01 Jun 2015

TL;DR: This paper reports submissions to semantic textual similarity task, i.e., task 2 in Semantic Evaluation 2015, using various traditional features, as well as novel similarity measures based on distributed word representations, which were trained using deep learning paradigms.

...read moreread less

Abstract: This paper reports our submissions to semantic textual similarity task, i.e., task 2 in Semantic Evaluation 2015. We built our systems using various traditional features, such as string-based, corpus-based and syntactic similarity metrics, as well as novel similarity measures based on distributed word representations, which were trained using deep learning paradigms. Since the training and test datasets consist of instances collected from various domains, three different strategies of the usage of training datasets were explored: (1) use all available training datasets and build a unified supervised model for all test datasets; (2) select the most similar training dataset and separately construct a individual model for each test set; (3) adopt multi-task learning framework to make full use of available training sets. Results on the test datasets show that using all datasets as training set achieves the best averaged performance and our best system ranks 15 out of 73.

...read moreread less

21 citations

Book Chapter•DOI•

Chinese Sentiment Analysis Using Bidirectional LSTM with Word Embedding

[...]

Zheng Xiao¹, PiJun Liang¹•Institutions (1)

Hunan University¹

29 Jul 2016

TL;DR: Bidirectional Long Short-Term Memory with word embedding for Chinese sentiment analysis can learn past and future information and capture stronger dependency relationship and achieves 91.46 % accuracy for sentiment analysis task.

...read moreread less

Abstract: Long Short-Term Memory network have been successfully applied to sequence modeling task and obtained great achievements However, Chinese text contains richer syntactic and semantic information and has strong intrinsic dependency between words and phrases In this paper, we propose Bidirectional Long Short-Term Memory (BLSTM) with word embedding for Chinese sentiment analysis BLSTM can learn past and future information and capture stronger dependency relationship Word embedding mainly extract words’ feature from raw characters input and carry important syntactic and semantic information Experimental results show that our model achieves 9146 % accuracy for sentiment analysis task

...read moreread less

21 citations

Proceedings Article•DOI•

EmTaggeR: A Word Embedding Based Novel Method for Hashtag Recommendation on Twitter

[...]

Kuntal Dey¹, Ritvik Shrivastava², Saroj Kaushik³, L. Venkata Subramaniam¹•Institutions (3)

IBM¹, Netaji Subhas Institute of Technology², Shiv Nadar University³

01 Nov 2017

TL;DR: In this article, the authors proposed a new methodology for hashtag recommendation for microblog posts, specifically Twitter, based on a training-testing framework that builds on the top of the concept of word embedding.

...read moreread less

Abstract: The hashtag recommendation problem addresses recommending (suggesting) one or more hashtags to explicitly tag a post made on a given social network platform, based upon the content and context of the post. In this work, we propose a novel methodology for hashtag recommendation for microblog posts, specifically Twitter. The methodology, EmTaggeR, is built upon a training-testing framework that builds on the top of the concept of word embedding. The training phase comprises of learning word vectors associated with each hashtag, and deriving a word embedding for each hashtag. We provide two training procedures, one in which each hashtag is trained with a separate word embedding model applicable in the context of that hashtag, and another in which each hashtag obtains its embedding from a global context. The testing phase constitutes computing the average word embedding of the test post, and finding the similarity of this embedding with the known embeddings of the hashtags. The tweets that contain the most-similar hashtag are extracted, and all the hashtags that appear in these tweets are ranked in terms of embedding similarity scores. The top-K hashtags that appear in this ranked list, are recommended for the given test post. Our system produces F1 score of 50.83%, improving over the LDA baseline by around 6.53 times, outperforming the best-performing system known in the literature that provides a lift of 6.42 times. EmTaggeR is a fast, scalable and lightweight system, which makes it practical to deploy in real-life applications.

...read moreread less

21 citations

Book Chapter•DOI•

Mitigating Gender Bias in Machine Learning Data Sets

[...]

Susan Leavy¹, Gerardine Meaney¹, Karen Wade¹, Derek Greene¹•Institutions (1)

University College Dublin¹

14 Apr 2020

TL;DR: This paper proposes a framework for the identification of gender bias in training data for machine learning and draws upon gender theory and sociolinguistics to systematically indicate levels of bias in textual training data and associated neural word embedding models, thus highlighting pathways for both removing bias fromTraining data and critically assessing its impact.

...read moreread less

Abstract: Algorithmic bias has the capacity to amplify and perpetuate societal bias, and presents profound ethical implications for society. Gender bias in algorithms has been identified in the context of employment advertising and recruitment tools, due to their reliance on underlying language processing and recommendation algorithms. Attempts to address such issues have involved testing learned associations, integrating concepts of fairness to machine learning, and performing more rigorous analysis of training data. Mitigating bias when algorithms are trained on textual data is particularly challenging given the complex way gender ideology is embedded in language. This paper proposes a framework for the identification of gender bias in training data for machine learning. The work draws upon gender theory and sociolinguistics to systematically indicate levels of bias in textual training data and associated neural word embedding models, thus highlighting pathways for both removing bias from training data and critically assessing its impact in the context of search and recommender systems.

...read moreread less

21 citations

Collapse

Network Information

Performance

Metrics

5,718

Papers

201,647

Citations

No. of papers in the topic in previous years
Year	Papers
2023	317
2022	716
2021	736
2020	1,025
2019	1,078
2018	788

Word embedding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics