Topic

Word embedding

About: Word embedding is a research topic. Over the lifetime, 4683 publications have been published within this topic receiving 153378 citations. The topic is also known as: word embeddings.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Sentence Meta-Embeddings for Unsupervised Semantic Textual Similarity

[...]

Nina Poerner¹, Ulli Waltinger², Hinrich Schütze¹•Institutions (2)

Ludwig Maximilian University of Munich¹, Siemens²

06 Jul 2020

TL;DR: This work addresses the task of unsupervised Semantic Textual Similarity (STS) by ensembling diverse pre-trained sentence encoders into sentence meta-embeddings and applies, extend and evaluates different meta- embedding methods from the word embedding literature at the sentence level, including dimensionality reduction and generalized Canonical Correlation Analysis.

...read moreread less

Abstract: We address the task of unsupervised Semantic Textual Similarity (STS) by ensembling diverse pre-trained sentence encoders into sentence meta-embeddings. We apply, extend and evaluate different meta-embedding methods from the word embedding literature at the sentence level, including dimensionality reduction (Yin and Schutze, 2016), generalized Canonical Correlation Analysis (Rastogi et al., 2015) and cross-view auto-encoders (Bollegala and Bao, 2018). Our sentence meta-embeddings set a new unsupervised State of The Art (SoTA) on the STS Benchmark and on the STS12-STS16 datasets, with gains of between 3.7% and 6.4% Pearson’s r over single-source systems.

...read moreread less

23 citations

Journal Article•DOI•

Measuring text similarity based on structure and word embedding

[...]

Mamdouh Farouk¹•Institutions (1)

Assiut University¹

01 Oct 2020-Cognitive Systems Research

TL;DR: The proposed approach combines different similarity measures in the calculation of sentence similarity and exploits sentence semantic structure to improve the accuracy of the sentence similarity calculation.

...read moreread less

23 citations

Journal Article•DOI•

Variable Convolution and Pooling Convolutional Neural Network for Text Sentiment Classification

[...]

Min Dong¹, Li Yongfa¹, Xue Tang¹, Jingyun Xu¹, Sheng Bi¹, Yi Cai¹ - Show less +2 more•Institutions (1)

South China University of Technology¹

15 Jan 2020-IEEE Access

TL;DR: A convolutional neural network based on multiple convolutions and pooling for text sentiment classification (variable convolution and pooled convolution neural network, VCPCNN) is proposed.

...read moreread less

Abstract: With the popularity of the internet, the expression of emotions and methods of communication are becoming increasingly abundant, and most of these emotions are transmitted in text form. Text sentiment classification research mainly includes three methods based on sentiment dictionaries, machine learning and deep learning. In recent years, many deep learning-based works have used TextCNN (text convolution neural network) to extract text semantic information for text sentiment analysis. However, TextCNN only considers the length of the sentence when extracting semantic information. It ignores the semantic features between word vectors and only considers the maximum feature value of the feature image in the pooling layer without considering other information. Therefore, in this paper, we propose a convolutional neural network based on multiple convolutions and pooling for text sentiment classification (variable convolution and pooling convolution neural network, VCPCNN). There are three contributions in this paper. First, a multiconvolution and pooling neural network is proposed for the TextCNN network structure. Second, four convolution operations are introduced in the word embedding dimension or direction, which are helpful for mining the local features on the semantic dimensions of word vectors. Finally, average pooling is introduced in the pooling layer, which is beneficial for saving the important feature information of the extracted features. The verification test was carried out on four emotional datasets, including English emotional polarity, Chinese emotional polarity, Chinese subjective and objective emotion and Chinese multicategory. Our apporach is effective in that its result was up to 1.97% higher than that of the TextCNN network.

...read moreread less

23 citations

Journal Article•DOI•

TechWord: Development of a technology lexical database for structuring textual technology information based on natural language processing

[...]

Hyejin Jang¹, Yujin Jeong¹, Byungun Yoon¹•Institutions (1)

Dongguk University¹

01 Feb 2021-Expert Systems With Applications

TL;DR: This study improves technological information-based text mining by structuring the word-to-word link information in technological documents based on an automated process by proposing a methodology for designing a TechWord-based lexical database based on the lexical characteristics of technological words that are differentiated from general words.

...read moreread less

Abstract: The role of text mining based on technological documents such as patents is important in the research field of technology intelligence for technology R&D planning. In addition, WordNet, an English-based lexical database, is widely used for pre-processing text data such as word lemmatization and synonym search. However, technological vocabulary information is complex and specific, and WordNet’s ability to analyze technological information is limited in its reflecting technological features. Thus, to improve the text mining performance of technological information, this study proposes a methodology for designing a TechWord-based lexical database that is based on the lexical characteristics of technological words that are differentiated from general words. To do this, we define TechWord, a technology lexical information, and construct a TechSynset, a synonym set between TechWords. First, through dependency parsing between words, TechWord, a unit word that describes a technology, is structured and identifies nouns and verbs. The importance of connectivity is investigated by a network centrality index analysis based on the dependency relations of words. Subsequently, to search for synonyms suitable for the target technology domain, a TechSynset is constructed through synset information, with an additional analysis that calculates cosine similarity based on a word embedding vector. Applying the proposed methodology to the actual technology-related information analysis, we collect patent data on the technological fields of the automotive field, and present the results of the TechWord and TechSynset. This study improves technological information-based text mining by structuring the word-to-word link information in technological documents based on an automated process.

...read moreread less

23 citations

Proceedings Article•DOI•

Improving RNN with Attention and Embedding for Adverse Drug Reactions

[...]

Chandra Pandey¹, Zina M. Ibrahim¹, Honghan Wu¹, Ehtesham Iqbal¹, Richard Dobson¹ - Show less +1 more•Institutions (1)

King's College London¹

02 Jul 2017

TL;DR: The impact of initializing the training of a neural network natural language processing algorithm with pre-defined clinical word embeddings to improve feature extraction and relationship classification between entities is studied.

...read moreread less

Abstract: Electronic Health Records (EHR) narratives are a rich source of information, embedding high-resolution information of value to secondary research use. However, because the EHRs are mostly in natural language free-text and highly ambiguity-ridden, many natural language processing algorithms have been devised around them to extract meaningful structured information about clinical entities. The performance of the algorithms however, largely varies depending on the training dataset as well as the effectiveness of the use of background knowledge to steer the learning process.In this paper we study the impact of initializing the training of a neural network natural language processing algorithm with pre-defined clinical word embeddings to improve feature extraction and relationship classification between entities. We add our embedding framework to a bi-directional long short-term memory (Bi-LSTM) neural network, and further study the effect of using attention weights in neural networks for sequence labelling tasks to extract knowledge of Adverse Drug Reactions (ADRs). We incorporate unsupervised word embeddings using Word2Vec and GloVe from widely available medical resources such as Multiparameter Intelligent Monitoring in Intensive Care (MIMIC) II corpora, Unified Medical Language System (UMLS) as well as embed pharmaco lexicon from available EHRs. Our algorithm, implemented using two datasets, shows that our architecture outperforms baseline Bi-LSTM or Bi-LSTM networks using linear chain and Skip-Chain conditional random fields (CRF).

...read moreread less

23 citations

Collapse

Network Information

Performance

Metrics

5,718

Papers

201,647

Citations

No. of papers in the topic in previous years
Year	Papers
2023	317
2022	716
2021	736
2020	1,025
2019	1,078
2018	788

Word embedding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics