Topic

Word embedding

About: Word embedding is a research topic. Over the lifetime, 4683 publications have been published within this topic receiving 153378 citations. The topic is also known as: word embeddings.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Entity Extraction for Malayalam Social Media Text Using Structured Skip-gram Based Embedding Features from Unlabeled Data

[...]

G. Remmiya Devi¹, P.V. Veena¹, M. Anand Kumar¹, K. P. Soman¹•Institutions (1)

Amrita Vishwa Vidyapeetham¹

01 Jan 2016-Procedia Computer Science

TL;DR: Unsupervised features retrieved using Structured Skip-gram model contributes to the reason for achieving better performance in the FIRE2015 entity extraction task.

...read moreread less

23 citations

Journal Article•DOI•

Bridging Semantic Gaps between Natural Languages and APIs with Word Embedding

[...]

Xiaochen Li¹, He Jiang¹, Yasutaka Kamei², Xin Chen³•Institutions (3)

Dalian University of Technology¹, Kyushu University², Hangzhou Dianzi University³

01 Oct 2020-IEEE Transactions on Software Engineering

TL;DR: Wang et al. as discussed by the authors proposed a shuffling strategy to transform related words and APIs into tuples to address the alignment challenge, using these tuples, Word2API models words and API simultaneously.

...read moreread less

Abstract: Developers increasingly rely on text matching tools to analyze the relation between natural language words and APIs. However, semantic gaps, namely textual mismatches between words and APIs, negatively affect these tools. Previous studies have transformed words or APIs into low-dimensional vectors for matching; however, inaccurate results were obtained due to the failure of modeling words and APIs simultaneously. To resolve this problem, two main challenges are to be addressed: the acquisition of massive words and APIs for mining and the alignment of words and APIs for modeling. Therefore, this study proposes Word2API to effectively estimate relatedness of words and APIs. Word2API collects millions of commonly used words and APIs from code repositories to address the acquisition challenge. Then, a shuffling strategy is used to transform related words and APIs into tuples to address the alignment challenge. Using these tuples, Word2API models words and APIs simultaneously. Word2API outperforms baselines by 10-49.6 percent of relatedness estimation in terms of precision and NDCG. Word2API is also effective on solving typical software tasks, e.g., query expansion and API documents linking. A simple system with Word2API-expanded queries recommends up to 21.4 percent more related APIs for developers. Meanwhile, Word2API improves comparison algorithms by 7.9-17.4 percent in linking questions in Question&Answer communities to API documents.

...read moreread less

23 citations

Journal Article•DOI•

Geographic Named Entity Recognition and Disambiguation in Mexican News using word embeddings

[...]

Alejandro Molina-Villegas, Victor Muñiz-Sanchez¹, Jean Arreola-Trapala¹, Filomeno Alcántara¹•Institutions (1)

Centro de Investigación en Matemáticas¹

15 Aug 2021-Expert Systems With Applications

TL;DR: This study shows that relationships between geographic and semantic spaces arise when the authors apply word embedding models over a corpus of documents in Mexican Spanish, and achieves high accuracy for geographic named entity recognition in Spanish.

...read moreread less

Abstract: In recent years, dense word embeddings for text representation have been widely used since they can model complex semantic and morphological characteristics of language, such as meaning in specific contexts and applications. Contrary to sparse representations, such as one-hot encoding or frequencies, word embeddings provide computational advantages and improvements on the results in many natural language processing tasks, similar to the automatic extraction of geospatial information. Computer systems capable of discovering geographic information from natural language involve a complex process called geoparsing. In this work, we explore the use of word embeddings for two NLP tasks: Geographic Named Entity Recognition and Geographic Entity Disambiguation, both as an effort to develop the first Mexican Geoparser. Our study shows that relationships between geographic and semantic spaces arise when we apply word embedding models over a corpus of documents in Mexican Spanish. Our models achieved high accuracy for geographic named entity recognition in Spanish.

...read moreread less

23 citations

Journal Article•DOI•

Simplifying drug package leaflets written in Spanish by using word embedding

[...]

Isabel Segura-Bedmar¹, Paloma Martínez¹•Institutions (1)

Charles III University of Madrid¹

29 Sep 2017-Journal of Biomedical Semantics

TL;DR: This work provides a promising approach to simplify DPLs without using terminological resources or parallel corpora and proposes the use of word embeddings to identify the simplest synonym for a given term.

...read moreread less

Abstract: Drug Package Leaflets (DPLs) provide information for patients on how to safely use medicines Pharmaceutical companies are responsible for producing these documents However, several studies have shown that patients usually have problems in understanding sections describing posology (dosage quantity and prescription), contraindications and adverse drug reactions An ultimate goal of this work is to provide an automatic approach that helps these companies to write drug package leaflets in an easy-to-understand language Natural language processing has become a powerful tool for improving patient care and advancing medicine because it leads to automatically process the large amount of unstructured information needed for patient care However, to the best of our knowledge, no research has been done on the automatic simplification of drug package leaflets In a previous work, we proposed to use domain terminological resources for gathering a set of synonyms for a given target term A potential drawback of this approach is that it depends heavily on the existence of dictionaries, however these are not always available for any domain and language or if they exist, their coverage is very scarce To overcome this limitation, we propose the use of word embeddings to identify the simplest synonym for a given term Word embedding models represent each word in a corpus with a vector in a semantic space Our approach is based on assumption that synonyms should have close vectors because they occur in similar contexts In our evaluation, we used the corpus EasyDPL (Easy Drug Package Leaflets), a collection of 306 leaflets written in Spanish and manually annotated with 1400 adverse drug effects and their simplest synonyms We focus on leaflets written in Spanish because it is the second most widely spoken language on the world, but as for the existence of terminological resources, the Spanish language is usually less prolific than the English language Our experiments show an accuracy of 385% using word embeddings This work provides a promising approach to simplify DPLs without using terminological resources or parallel corpora Moreover, it could be easily adapted to different domains and languages However, more research efforts are needed to improve our approach based on word embedding because it does not overcome our previous work using dictionaries yet

...read moreread less

23 citations

Posted Content•

CogniVal: A Framework for Cognitive Word Embedding Evaluation

[...]

Nora Hollenstein¹, Antonio de la Torre¹, Nicolas Langer¹, Ce Zhang²•Institutions (2)

ETH Zurich¹, University of Zurich²

19 Sep 2019-arXiv: Computation and Language

TL;DR: This paper presents the first multi-modal framework for evaluating English word representations based on cognitive lexical semantics, and finds strong correlations in the results between cognitive datasets, across recording modalities and to their performance on extrinsic NLP tasks.

...read moreread less

Abstract: An interesting method of evaluating word representations is by how much they reflect the semantic representations in the human brain. However, most, if not all, previous works only focus on small datasets and a single modality. In this paper, we present the first multi-modal framework for evaluating English word representations based on cognitive lexical semantics. Six types of word embeddings are evaluated by fitting them to 15 datasets of eye-tracking, EEG and fMRI signals recorded during language processing. To achieve a global score over all evaluation hypotheses, we apply statistical significance testing accounting for the multiple comparisons problem. This framework is easily extensible and available to include other intrinsic and extrinsic evaluation methods. We find strong correlations in the results between cognitive datasets, across recording modalities and to their performance on extrinsic NLP tasks.

...read moreread less

23 citations

Collapse

Network Information

Performance

Metrics

5,718

Papers

201,647

Citations

No. of papers in the topic in previous years
Year	Papers
2023	317
2022	716
2021	736
2020	1,025
2019	1,078
2018	788

Word embedding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics