Enriching Word Vectors with Subword Information

Open AccessPosted Content

Enriching Word Vectors with Subword Information

Piotr Bojanowski, +3 more

- 15 Jul 2016 -

arXiv: Computation and Language

Chats0

TLDR

A new approach based on the skipgram model, where each word is represented as a bag of character n-grams, with words being represented as the sum of these representations, which achieves state-of-the-art performance on word similarity and analogy tasks.

Abstract:

Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Popular models that learn such representations ignore the morphology of words, by assigning a distinct vector to each word. This is a limitation, especially for languages with large vocabularies and many rare words. In this paper, we propose a new approach based on the skipgram model, where each word is represented as a bag of character $n$-grams. A vector representation is associated to each character $n$-gram; words being represented as the sum of these representations. Our method is fast, allowing to train models on large corpora quickly and allows us to compute word representations for words that did not appear in the training data. We evaluate our word representations on nine different languages, both on word similarity and analogy tasks. By comparing to recently proposed morphological word representations, we show that our vectors achieve state-of-the-art performance on these tasks.

Citations

PDF

Open Access

More filters

Posted Content

Recent Trends in Deep Learning Based Natural Language Processing

Tom Young, +3 more

- 09 Aug 2017 -

arXiv: Computation and Language

TL;DR: Deep learning methods employ multiple processing layers to learn hierarchical representations of data and have produced state-of-the-art results in many domains as mentioned in this paper, such as natural language processing (NLP).

...read moreread less

Posted Content

Poincar\'e Embeddings for Learning Hierarchical Representations

Maximilian Nickel, +1 more

- 22 May 2017 -

arXiv: Artificial Intelligence

TL;DR: For example, the authors embeds symbolic data into an n-dimensional Poincare ball to learn parsimonious representations of symbolic data by simultaneously capturing hierarchy and similarity, and then uses Riemannian optimization to learn the embeddings.

...read moreread less

Proceedings ArticleDOI

Deep Learning for Entity Matching: A Design Space Exploration

Sidharth Mudgal, +8 more

TL;DR: The results show that DL does not outperform current solutions on structured EM, but it can significantly outperform them on textual and dirty EM, which suggests that practitioners should seriously consider using DL for textual anddirty EM problems.

...read moreread less

Posted Content

Optimal Hyperparameters for Deep LSTM-Networks for Sequence Labeling Tasks

Nils Reimers, +1 more

- 21 Jul 2017 -

arXiv: Computation and Language

TL;DR: This paper evaluates the importance of different network design choices and hyperparameters for five common linguistic sequence tagging tasks and found, that some parameters, like the pre-trained word embeddings or the last layer of the network, have a large impact on the performance, while other parameters, for example the number of LSTM layers or theNumber of recurrent units, are of minor importance.

...read moreread less

Journal ArticleDOI

Deep Recurrent neural network vs. support vector machine for aspect-based sentiment analysis of Arabic hotels’ reviews

Mohammad AL-Smadi, +4 more

- 01 Nov 2017 -

Journal of Computational Science

TL;DR: The state-of-the-art approaches based on supervised machine learning are presented to address the challenges of aspect-based sentiment analysis (ABSA) of Arabic Hotels’ reviews and the SVM approach outperforms the other deep RNN approach in the research investigated tasks.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Posted Content

Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, +3 more

- 16 Jan 2013 -

arXiv: Computation and Language

TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.

...read moreread less

Journal ArticleDOI

Indexing by Latent Semantic Analysis

Scott Deerwester, +4 more

- 01 Sep 1990 -

Journal of the Association for Informati...

TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.

...read moreread less

Posted Content

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

- 16 Oct 2013 -

arXiv: Computation and Language

TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.

...read moreread less

Proceedings ArticleDOI

Neural Machine Translation of Rare Words with Subword Units

Rico Sennrich, +2 more

TL;DR: This paper introduces a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units, and empirically shows that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English-German and English-Russian by 1.3 BLEU.

...read moreread less

Proceedings ArticleDOI

A unified architecture for natural language processing: deep neural networks with multitask learning

Ronan Collobert, +1 more

TL;DR: This work describes a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense using a language model.

...read moreread less