Topic

Word embedding

About: Word embedding is a research topic. Over the lifetime, 4683 publications have been published within this topic receiving 153378 citations. The topic is also known as: word embeddings.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Word Embedding Comparison for Indonesian Language Sentiment Analysis

[...]

Helmi Imaduddin, Widyawan, Silmi Fauziati

13 Mar 2019

TL;DR: The results show that the glove method is the best word embedding method for hotel review data, and the words2vec Continuous Bag of Words CBOW, word2vec skip-gram, doc2vec, and glove are compared.

...read moreread less

Abstract: Development of information technology makes the production of data increase dramatically. We can get lots of data from the internet, including data reviews about a product or service. The more data obtained, the system is needed to process it. Sentiment analysis is a text processing of Natural Language Processing (NLP) that can help someone to see the quality of service offered, including hotel services. This paper uses hotel review data to carry out sentiment analysis obtained from the Traveloka website. The data classified using the Long Short-Term Memory (LSTM) algorithm. To get better results, the authors use word embedding to convert words into vectors. This study aims to compare the performance of several word embedding, while word embedding compared is word2vec Continuous Bag of Words CBOW, word2vec skip-gram, doc2vec, and glove. From the experiment conducted, the results show that the glove method has the highest accuracy of 95.52% while the word2vec skip-gram model has the lowest accuracy of 91.81%, so it concluded that the glove method is the best word embedding method for hotel review data.

...read moreread less

19 citations

Journal Article•DOI•

Semantic Feature Learning via Dual Sequences for Defect Prediction

[...]

Junhao Lin¹, Lu Lu¹•Institutions (1)

South China University of Technology¹

18 Jan 2021-IEEE Access

TL;DR: Wang et al. as mentioned in this paper proposed a framework called Semantic Feature Learning via Dual Sequences (SFLDS), which can capture the semantic and structural information in the Abstract Syntax Tree (AST) for feature generation.

...read moreread less

Abstract: Software defect prediction (SDP) can help developers reasonably allocate limited resources for locating bugs and prioritizing their testing efforts. Existing methods often serialize an Abstract Syntax Tree (AST) obtained from the program source code into a token sequence, which is then inputted into the deep learning model to learn the semantic features. However, there are different ASTs with the same token sequence, and it is impossible to distinguish the tree structure of the ASTs only by a token sequence. To solve this problem, this paper proposes a framework called Semantic Feature Learning via Dual Sequences (SFLDS), which can capture the semantic and structural information in the AST for feature generation. Specifically, based on the AST, we select the representative nodes in the AST and convert the program source code into a simplified AST (S-AST). Our method introduces two sequences to represent the semantic and structural information of the S-AST, one is the result of traversing the S-AST node in pre-order, and another is composed of parent nodes. Then each token in the dual sequences is encoded as a numerical vector via mapping and word embedding. Finally, we use a bi-directional long short-term memory (BiLSTM) based neural network to automatically generate semantic features from the dual sequences for SDP. In addition, to leverage the statistical characteristics contained in the handcrafted metrics, we also propose a framework called Defect Prediction via SFLDS (DP-SFLDS) which combines the semantic features generated from SFLDS with handcrafted metrics to perform SDP. In our empirical studies, eight open-source Java projects from the PROMISE repository are chosen as our empirical subjects. Experimental results show that our proposed approach can perform better than several state-of-the-art baseline SDP methods.

...read moreread less

19 citations

Journal Article•DOI•

A Rule-Based Approach to Embedding Techniques for Text Document Classification

[...]

Asmaa M. Aubaid, Alok Mishra

10 Jun 2020-Applied Sciences

TL;DR: It was observed that the algorithm document to vector rule-based (D2vecRule) was good when compared with other algorithms such as JRip, One R, and ZeroR applied to the same Reuters-21578 dataset.

...read moreread less

Abstract: With the growth of online information and sudden expansion in the number of electronic documents provided on websites and in electronic libraries, there is difficulty in categorizing text documents. Therefore, a rule-based approach is a solution to this problem; the purpose of this study is to classify documents by using a rule-based. This paper deals with the rule-based approach with the embedding technique for a document to vector (doc2vec) files. An experiment was performed on two data sets Reuters-21578 and the 20 Newsgroups to classify the top ten categories of these data sets by using a document to vector rule-based (D2vecRule). Finally, this method provided us a good classification result according to the F-measures and implementation time metrics. In conclusion, it was observed that our algorithm document to vector rule-based (D2vecRule) was good when compared with other algorithms such as JRip, One R, and ZeroR applied to the same Reuters-21578 dataset.

...read moreread less

19 citations

Posted Content•

Frustratingly Easy Meta-Embedding -- Computing Meta-Embeddings by Averaging Source Word Embeddings.

[...]

Joshua Coates, Danushka Bollegala¹•Institutions (1)

University of Liverpool¹

14 Apr 2018-arXiv: Computation and Language

TL;DR: This paper shows that the arithmetic mean of two distinct word embedding sets yields a performant meta-embedding that is comparable or better than more complex meta- embedding learning methods.

...read moreread less

Abstract: Creating accurate meta-embeddings from pre-trained source embeddings has received attention lately. Methods based on global and locally-linear transformation and concatenation have shown to produce accurate meta-embeddings. In this paper, we show that the arithmetic mean of two distinct word embedding sets yields a performant meta-embedding that is comparable or better than more complex meta-embedding learning methods. The result seems counter-intuitive given that vector spaces in different source embeddings are not comparable and cannot be simply averaged. We give insight into why averaging can still produce accurate meta-embedding despite the incomparability of the source vector spaces.

...read moreread less

19 citations

Proceedings Article•DOI•

Debiasing Gender biased Hindi Words with Word-embedding

[...]

Arun K. Pujari¹, Ansh Mittal², Anshuman Padhi³, Anshul Jain², Mukesh K. Jadon², Vikas Kumar¹ - Show less +2 more•Institutions (3)

Central University of Rajasthan¹, LNM Institute of Information Technology², Maharaja Surajmal Institute of Technology³

20 Dec 2019

TL;DR: It is shown that many gender-neutral words in Hindi are mapped to vectors which are inclined towards one gender or the other in multi-dimensional space and a new debiasing algorithm is proposed that can be applicable in the context of any language.

...read moreread less

Abstract: Word-embedding is a major machine learning technique for computational applications of languages. For a given corpus, the process of word-embedding is to embed each word onto multi-dimensional space such that semantic similarities between similar words are retained. While learning the similarity as encapsulated in the training corpus, the embedding process inadvertently captures many other inherent features present in the corpus. One such thing is the bias arising out of stereotyping present in almost all the corpus no matter how extensively used and trusted they are. We study this aspect of word-embedding in the context of Hindi language. We show that many gender-neutral words in Hindi are mapped to vectors which are inclined towards one gender or the other in multi-dimensional space. We propose a new algorithm of debiasing and demonstrate its efficacy in the context of Hindi language. Further, we build a SVM-based classifier that determines whether a gender-neutral word is classified as neutral or otherwise. We corroborate our claim with experimental results on large number of individual words. This work is first ever result on debiasing in Hindi Language and our new debiasing algorithm can be applicable in the context of any language.

...read moreread less

19 citations

Collapse

Network Information

Performance

Metrics

5,718

Papers

201,647

Citations

No. of papers in the topic in previous years
Year	Papers
2023	317
2022	716
2021	736
2020	1,025
2019	1,078
2018	788

Word embedding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics