Topic

Word embedding

About: Word embedding is a research topic. Over the lifetime, 4683 publications have been published within this topic receiving 153378 citations. The topic is also known as: word embeddings.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

MetaMT, a Meta Learning Method Leveraging Multiple Domain Data for Low Resource Machine Translation

[...]

Rumeng Li¹, Xun Wang¹, Hong Yu¹•Institutions (1)

University of Massachusetts Amherst¹

03 Apr 2020

TL;DR: A novel NMT model with a new word embedding transition technique for fast domain adaption and a new training strategy based on meta-learning is developed along with the proposed model to update the model parameters and meta parameters alternately.

...read moreread less

Abstract: Neural machine translation (NMT) models have achieved state-of-the-art translation quality with a large quantity of parallel corpora available. However, their performance suffers significantly when it comes to domain-specific translations, in which training data are usually scarce. In this paper, we present a novel NMT model with a new word embedding transition technique for fast domain adaption. We propose to split parameters in the model into two groups: model parameters and meta parameters. The former are used to model the translation while the latter are used to adjust the representational space to generalize the model to different domains. We mimic the domain adaptation of the machine translation model to low-resource domains using multiple translation tasks on different domains. A new training strategy based on meta-learning is developed along with the proposed model to update the model parameters and meta parameters alternately. Experiments on datasets of different domains showed substantial improvements of NMT performances on a limited amount of data.

...read moreread less

22 citations

Proceedings Article•DOI•

SAR: learning cross-language API mappings with little knowledge

[...]

Nghi D. Q. Bui¹, Yijun Yu², Lingxiao Jiang¹•Institutions (2)

Singapore Management University¹, Open University²

12 Aug 2019

TL;DR: This paper aims at an automated approach that can map APIs across languages with much less a priori knowledge than other approaches, based on an realization of the notion of domain adaption, combined with code embedding, to better align two vector spaces.

...read moreread less

Abstract: To save effort, developers often translate programs from one programming language to another, instead of implementing it from scratch. Translating application program interfaces (APIs) used in one language to functionally equivalent ones available in another language is an important aspect of program translation. Existing approaches facilitate the translation by automatically identifying the API mappings across programming languages. However, these approaches still require large amount of parallel corpora, ranging from pairs of APIs or code fragments that are functionally equivalent, to similar code comments. To minimize the need of parallel corpora, this paper aims at an automated approach that can map APIs across languages with much less a priori knowledge than other approaches. The approach is based on an realization of the notion of domain adaption, combined with code embedding, to better align two vector spaces. Taking as input large sets of programs, our approach first generates numeric vector representations of the programs (including the APIs used in each language), and it adapts generative adversarial networks (GAN) to align the vectors in different spaces of two languages. For a better alignment, we initialize the GAN with parameters derived from API mapping seeds that can be identified accurately with a simple automatic signature-based matching heuristic. Then the cross language API mappings can be identified via nearest-neighbors queries in the aligned vector spaces. We have implemented the approach (SAR, named after three main technical components in the approach) in a prototype for mapping APIs across Java and C# programs. Our evaluation on about 2 million Java files and 1 million C# files shows that the approach can achieve 54% and 82% mapping accuracy in its top-1 and top-10 API mapping results with only 174 automatically identified seeds, more accurate than other approaches using the same or much more mapping seeds.

...read moreread less

22 citations

Journal Article•DOI•

Deep learning-based cryptocurrency sentiment construction

[...]

Sergey Nasekin¹, Cathy Yi-Hsuan Chen²•Institutions (2)

Deutsche Bank¹, University of Glasgow²

01 Sep 2020

TL;DR: It is found that the constructed sentiment indices are informative regarding returns and volatility predictability of the cryptocurrency market index.

...read moreread less

Abstract: We study investor sentiment on a non-classical asset such as cryptocurrency using machine learning methods. We account for context-specific information and word similarity using efficient language modeling tools such as construction of featurized word representations (embeddings) and recursive neural networks. We apply these tools for sentence-level sentiment classification and sentiment index construction. This analysis is performed on a novel dataset of 1220K messages related to 425 cryptocurrencies posted on a microblogging platform StockTwits during the period between March 2013 and May 2018. Both in- and out-of-sample predictive regressions are run to test significance of the constructed sentiment index variables. We find that the constructed sentiment indices are informative regarding returns and volatility predictability of the cryptocurrency market index.

...read moreread less

22 citations

Proceedings Article•

Embedding Words as Distributions with a Bayesian Skip-gram Model

[...]

Arthur Bražinskas¹, Serhii Havrylov¹, Ivan Titov¹•Institutions (1)

University of Edinburgh¹

01 Aug 2018

TL;DR: The authors proposed a method for embedding words as probability densities in a low-dimensional space by generating word embeddings from a word-specific prior density for each occurrence of a given word.

...read moreread less

Abstract: We introduce a method for embedding words as probability densities in a low-dimensional space. Rather than assuming that a word embedding is fixed across the entire text collection, as in standard word embedding methods, in our Bayesian model we generate it from a word-specific prior density for each occurrence of a given word. Intuitively, for each word, the prior density encodes the distribution of its potential ‘meanings’. These prior densities are conceptually similar to Gaussian embeddings of ėwcitevilnis2014word. Interestingly, unlike the Gaussian embeddings, we can also obtain context-specific densities: they encode uncertainty about the sense of a word given its context and correspond to the approximate posterior distributions within our model. The context-dependent densities have many potential applications: for example, we show that they can be directly used in the lexical substitution task. We describe an effective estimation method based on the variational autoencoding framework. We demonstrate the effectiveness of our embedding technique on a range of standard benchmarks.

...read moreread less

22 citations

Proceedings Article•DOI•

Hierarchical Multi-Task Word Embedding Learning for Synonym Prediction

[...]

Hongliang Fei¹, Shulong Tan¹, Ping Li¹•Institutions (1)

Baidu¹

25 Jul 2019

TL;DR: An automatic way to accelerate the process of medical synonymy resource development for Chinese, including both formal entities from healthcare professionals and noisy descriptions from end-users is proposed and a large medical text corpus in Chinese is created that includes annotations for entities, descriptions and synonymous pairs.

...read moreread less

Abstract: Automatic synonym recognition is of great importance for entity-centric text mining and interpretation. Due to the high language use variability in real-life, manual construction of semantic resources to cover all synonyms is prohibitively expensive and may also result in limited coverage. Although there are public knowledge bases, they only have limited coverage for languages other than English. In this paper, we focus on medical domain and propose an automatic way to accelerate the process of medical synonymy resource development for Chinese, including both formal entities from healthcare professionals and noisy descriptions from end-users. Motivated by the success of distributed word representations, we design a multi-task model with hierarchical task relationship to learn more representative entity/term embeddings and apply them to synonym prediction. In our model, we extend the classical skip-gram word embedding model by introducing an auxiliary task "neighboring word semantic type prediction'' and hierarchically organize them based on the task complexity. Meanwhile, we incorporate existing medical term-term synonymous knowledge into our word embedding learning framework. We demonstrate that the embeddings trained from our proposed multi-task model yield significant improvement for entity semantic relatedness evaluation, neighboring word semantic type prediction and synonym prediction compared with baselines. Furthermore, we create a large medical text corpus in Chinese that includes annotations for entities, descriptions and synonymous pairs for future research in this direction.

...read moreread less

22 citations

Collapse

Network Information

Performance

Metrics

5,718

Papers

201,647

Citations

No. of papers in the topic in previous years
Year	Papers
2023	317
2022	716
2021	736
2020	1,025
2019	1,078
2018	788

Word embedding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics