Showing papers on "Word embedding published in 2015"

PDF

Open Access

Proceedings Article•

Recurrent convolutional neural networks for text classification

[...]

Siwei Lai¹, Liheng Xu¹, Kang Liu¹, Jun Zhao¹•Institutions (1)

25 Jan 2015

TL;DR: A recurrent convolutional neural network is introduced for text classification without human-designed features to capture contextual information as far as possible when learning word representations, which may introduce considerably less noise compared to traditional window-based neural networks.

...read moreread less

Abstract: Text classification is a foundational task in many NLP applications. Traditional text classifiers often rely on many human-designed features, such as dictionaries, knowledge bases and special tree kernels. In contrast to traditional methods, we introduce a recurrent convolutional neural network for text classification without human-designed features. In our model, we apply a recurrent structure to capture contextual information as far as possible when learning word representations, which may introduce considerably less noise compared to traditional window-based neural networks. We also employ a max-pooling layer that automatically judges which words play key roles in text classification to capture the key components in texts. We conduct experiments on four commonly used datasets. The experimental results show that the proposed method outperforms the state-of-the-art methods on several datasets, particularly on document-level datasets.

...read moreread less

1,981 citations

Journal Article•DOI•

Improving Distributional Similarity with Lessons Learned from Word Embeddings

[...]

Omer Levy¹, Yoav Goldberg¹, Ido Dagan¹•Institutions (1)

Bar-Ilan University¹

04 May 2015-Transactions of the Association for Computational Linguistics

TL;DR: It is revealed that much of the performance gains of word embeddings are due to certain system design choices and hyperparameter optimizations, rather than the embedding algorithms themselves, and these modifications can be transferred to traditional distributional models, yielding similar gains.

...read moreread less

Abstract: Recent trends suggest that neural-network-inspired word embedding models outperform traditional count-based distributional models on word similarity and analogy detection tasks. We reveal that much of the performance gains of word embeddings are due to certain system design choices and hyperparameter optimizations, rather than the embedding algorithms themselves. Furthermore, we show that these modifications can be transferred to traditional distributional models, yielding similar gains. In contrast to prior reports, we observe mostly local or insignificant performance differences between the methods, with no global advantage to any single approach over the others.

...read moreread less

1,374 citations

Journal Article•DOI•

Using recurrent neural networks for slot filling in spoken language understanding

[...]

Grégoire Mesnil¹, Yann N. Dauphin¹, Kaisheng Yao², Yoshua Bengio¹, Li Deng², Dilek Hakkani-Tur², Xiaodong He², Larry Heck³, Gokhan Tur⁴, Dong Yu², Geoffrey Zweig² - Show less +7 more•Institutions (4)

Université de Montréal¹, Microsoft², Google³, Apple Inc.⁴

01 Mar 2015-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This paper implemented and compared several important RNN architectures, including Elman, Jordan, and hybrid variants, and implemented these networks with the publicly available Theano neural network toolkit and completed experiments on the well-known airline travel information system (ATIS) benchmark.

...read moreread less

Abstract: Semantic slot filling is one of the most challenging problems in spoken language understanding (SLU). In this paper, we propose to use recurrent neural networks (RNNs) for this task, and present several novel architectures designed to efficiently model past and future temporal dependencies. Specifically, we implemented and compared several important RNN architectures, including Elman, Jordan, and hybrid variants. To facilitate reproducibility, we implemented these networks with the publicly available Theano neural network toolkit and completed experiments on the well-known airline travel information system (ATIS) benchmark. In addition, we compared the approaches on two custom SLU data sets from the entertainment and movies domains. Our results show that the RNN-based models outperform the conditional random field (CRF) baseline by 2% in absolute error reduction on the ATIS benchmark. We improve the state-of-the-art by 0.5% in the Entertainment domain, and 6.7% for the movies domain.

...read moreread less

562 citations

Journal Article•DOI•

Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features.

[...]

Azadeh Nikfarjam¹, Abeed Sarker¹, Karen O'Connor¹, Rachel Ginn¹, Graciela Gonzalez¹ - Show less +1 more•Institutions (1)

Arizona State University¹

01 May 2015-Journal of the American Medical Informatics Association

TL;DR: A machine learning-based approach to extract mentions of adverse drug reactions (ADRs) from highly informal text in social media, suitable for social media mining, as it relies on large volumes of unlabeled data, thus diminishing the need for large, annotated training data sets.

...read moreread less

495 citations

Proceedings Article•DOI•

Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation

[...]

Chao Xing¹, Dong Wang¹, Chao Liu¹, Yiye Lin¹•Institutions (1)

Tsinghua University¹

01 Jan 2015

TL;DR: A solution which normalizes the word vectors on a hypersphere and constrains the linear transform as an orthogonal transform and can offer better performance on a word similarity task and an English-toSpanish word translation task is proposed.

...read moreread less

Abstract: Word embedding has been found to be highly powerful to translate words from one language to another by a simple linear transform. However, we found some inconsistence among the objective functions of the embedding and the transform learning, as well as the distance measurement. This paper proposes a solution which normalizes the word vectors on a hypersphere and constrains the linear transform as an orthogonal transform. The experimental results confirmed that the proposed solution can offer better performance on a word similarity task and an English-toSpanish word translation task.

...read moreread less

436 citations

Proceedings Article•DOI•

[...]

Tom Kenter¹, Maarten de Rijke¹•Institutions (1)

University of Amsterdam¹

17 Oct 2015

TL;DR: This work proposes to go from word-level to text-level semantics by combining insights from methods based on external sources of semantic knowledge with word embeddings, and derives multiple types of meta-features from the comparison of the word vectors for short text pairs, and from the vector means of their respective word embedDings.

...read moreread less

Abstract: Determining semantic similarity between texts is important in many tasks in information retrieval such as search, query suggestion, automatic summarization and image finding. Many approaches have been suggested, based on lexical matching, handcrafted patterns, syntactic parse trees, external sources of structured semantic knowledge and distributional semantics. However, lexical features, like string matching, do not capture semantic similarity beyond a trivial level. Furthermore, handcrafted patterns and external sources of structured semantic knowledge cannot be assumed to be available in all circumstances and for all domains. Lastly, approaches depending on parse trees are restricted to syntactically well-formed texts, typically of one sentence in length. We investigate whether determining short text similarity is possible using only semantic features---where by semantic we mean, pertaining to a representation of meaning---rather than relying on similarity in lexical or syntactic representations. We use word embeddings, vector representations of terms, computed from unlabelled data, that represent terms in a semantic space in which proximity of vectors can be interpreted as semantic similarity. We propose to go from word-level to text-level semantics by combining insights from methods based on external sources of semantic knowledge with word embeddings. A novel feature of our approach is that an arbitrary number of word embedding sets can be incorporated. We derive multiple types of meta-features from the comparison of the word vectors for short text pairs, and from the vector means of their respective word embeddings. The features representing labelled short text pairs are used to train a supervised learning algorithm. We use the trained model at testing time to predict the semantic similarity of new, unlabelled pairs of short texts We show on a publicly available evaluation set commonly used for the task of semantic similarity that our method outperforms baseline methods that work under the same conditions.

...read moreread less

426 citations

Proceedings Article•

Topical word embeddings

[...]

Yang Liu¹, Zhiyuan Liu¹, Tat-Seng Chua², Maosong Sun¹•Institutions (2)

Tsinghua University¹, National University of Singapore²

25 Jan 2015

TL;DR: The experimental results show that the TWE models outperform typical word embedding models including the multi-prototype version on contextual word similarity, and also exceed latent topic models and other representative document models on text classification.

...read moreread less

Abstract: Most word embedding models typically represent each word using a single vector, which makes these models indiscriminative for ubiquitous homonymy and polysemy. In order to enhance discriminativeness, we employ latent topic models to assign topics for each word in the text corpus, and learn topical word embeddings (TWE) based on both words and their topics. In this way, contextual word embeddings can be flexibly obtained to measure contextual word similarity. We can also build document representations, which are more expressive than some widely-used document models such as latent topic models. In the experiments, we evaluate the TWE models on two tasks, contextual word similarity and text classification. The experimental results show that our models outperform typical word embedding models including the multi-prototype version on contextual word similarity, and also exceed latent topic models and other representative document models on text classification. The source code of this paper can be obtained from https://github.com/largelymfs/topical_word_embeddings.

...read moreread less

414 citations

Proceedings Article•DOI•

PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification

[...]

Ellie Pavlick¹, Pushpendre Rastogi², Juri Ganitkevitch², Benjamin Van Durme³, Chris Callison-Burch¹ - Show less +1 more•Institutions (3)

University of Pennsylvania¹, Johns Hopkins University², Saarland University³

01 Jul 2015

TL;DR: PPDB 2.0 includes a discriminatively re-ranked set of paraphrases that achieve a higher correlation with human judgments than PPDB 1.0's heuristic rankings.

...read moreread less

Abstract: We present a new release of the Paraphrase Database. PPDB 2.0 includes a discriminatively re-ranked set of paraphrases that achieve a higher correlation with human judgments than PPDB 1.0’s heuristic rankings. Each paraphrase pair in the database now also includes finegrained entailment relations, word embedding similarities, and style annotations.

...read moreread less

321 citations

Proceedings Article•

Joint learning of character and word embeddings

[...]

Xinxiong Chen¹, Lei Xu¹, Zhiyuan Liu¹, Maosong Sun¹, Huanbo Luan¹ - Show less +1 more•Institutions (1)

Tsinghua University¹

25 Jul 2015

TL;DR: A character-enhanced word embedding model (CWE) is presented to address the issues of character ambiguity and non-compositional words, and the effectiveness of CWE on word relatedness computation and analogical reasoning is evaluated.

...read moreread less

Abstract: Most word embedding methods take a word as a basic unit and learn embeddings according to words' external contexts, ignoring the internal structures of words. However, in some languages such as Chinese, a word is usually composed of several characters and contains rich internal information. The semantic meaning of a word is also related to the meanings of its composing characters. Hence, we take Chinese for example, and present a character-enhanced word embedding model (CWE). In order to address the issues of character ambiguity and non-compositional words, we propose multiple prototype character embeddings and an effective word selection method. We evaluate the effectiveness of CWE on word relatedness computation and analogical reasoning. The results show that CWE outperforms other baseline methods which ignore internal character information. The codes and data can be accessed from https://github.com/Leonard-Xu/CWE.

...read moreread less

265 citations

Proceedings Article•

Word Representations via Gaussian Embedding

[...]

Luke Vilnis¹, Andrew McCallum¹•Institutions (1)

University of Massachusetts Amherst¹

01 Jan 2015

TL;DR: This article proposed density-based distributed embeddings and presented a method for learning representations in the space of Gaussian distributions, which can capture uncertainty about a representation and its relationships, expressing asymmetries more naturally than dot product or cosine similarity.

...read moreread less

Abstract: Current work in lexical distributed representations maps each word to a point vector in low-dimensional space. Mapping instead to a density provides many interesting advantages, including better capturing uncertainty about a representation and its relationships, expressing asymmetries more naturally than dot product or cosine similarity, and enabling more expressive parameterization of decision boundaries. This paper advocates for density-based distributed embeddings and presents a method for learning representations in the space of Gaussian distributions. We compare performance on various word embedding benchmarks, investigate the ability of these embeddings to model entailment and other asymmetric relationships, and explore novel properties of the representation.

...read moreread less

262 citations

Proceedings Article•DOI•

Word Embedding based Generalized Language Model for Information Retrieval

[...]

Debasis Ganguly¹, Dwaipayan Roy², Mandar Mitra², Gareth J. F. Jones¹•Institutions (2)

Dublin City University¹, Indian Statistical Institute²

09 Aug 2015

TL;DR: A generalized language model is constructed, where the mutual independence between a pair of words (say t and t') no longer holds and the vector embeddings of the words are made use of to derive the transformation probabilities between words.

...read moreread less

Abstract: Word2vec, a state-of-the-art word embedding technique has gained a lot of interest in the NLP community. The embedding of the word vectors helps to retrieve a list of words that are used in similar contexts with respect to a given word. In this paper, we focus on using the word embeddings for enhancing retrieval effectiveness. In particular, we construct a generalized language model, where the mutual independence between a pair of words (say t and t') no longer holds. Instead, we make use of the vector embeddings of the words to derive the transformation probabilities between words. Specifically, the event of observing a term t in the query from a document d is modeled by two distinct events, that of generating a different term t', either from the document itself or from the collection, respectively, and then eventually transforming it to the observed query term t. The first event of generating an intermediate term from the document intends to capture how well does a term contextually fit within a document, whereas the second one of generating it from the collection aims to address the vocabulary mismatch problem by taking into account other related terms in the collection. Our experiments, conducted on the standard TREC collection, show that our proposed method yields significant improvements over LM and LDA-smoothed LM baselines.

...read moreread less

Posted Content•

How to Generate a Good Word Embedding

[...]

Siwei Lai, Kang Liu, Liheng Xu, Jun Zhao

20 Jul 2015-arXiv: Computation and Language

TL;DR: The authors analyze three critical components in training word embeddings: model, corpus, and training parameters, and evaluate each word embedding in three ways: analyzing its semantic properties, using it as a feature for supervised tasks, and using it to initialize neural networks.

...read moreread less

Abstract: We analyze three critical components of word embedding training: the model, the corpus, and the training parameters. We systematize existing neural-network-based word embedding algorithms and compare them using the same corpus. We evaluate each word embedding in three ways: analyzing its semantic properties, using it as a feature for supervised tasks and using it to initialize neural networks. We also provide several simple guidelines for training word embeddings. First, we discover that corpus domain is more important than corpus size. We recommend choosing a corpus in a suitable domain for the desired task, after that, using a larger corpus yields better results. Second, we find that faster models provide sufficient performance in most cases, and more complex models can be used if the training corpus is sufficiently large. Third, the early stopping metric for iterating should rely on the development set of the desired task rather than the validation loss of training embedding.

...read moreread less

Proceedings Article•DOI•

Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering

[...]

Guangyou Zhou¹, Tingting He¹, Jun Zhao¹, Po Hu²•Institutions (2)

Central China Normal University¹, Chinese Academy of Sciences²

01 Jul 2015

TL;DR: This paper proposes to learn continuous word embeddings with metadata of category information within cQA pages for question retrieval with the framework of fisher kernel to deal with the variable size of word embedding vectors.

...read moreread less

Abstract: Community question answering (cQA) has become an important issue due to the popularity of cQA archives on the web. This paper is concerned with the problem of question retrieval. Question retrieval in cQA archives aims to find the existing questions that are semantically equivalent or relevant to the queried questions. However, the lexical gap problem brings about new challenge for question retrieval in cQA. In this paper, we propose to learn continuous word embeddings with metadata of category information within cQA pages for question retrieval. To deal with the variable size of word embedding vectors, we employ the framework of fisher kernel to aggregated them into the fixedlength vectors. Experimental results on large-scale real world cQA data set show that our approach can significantly outperform state-of-the-art translation models and topic-based models for question re-

...read moreread less

Proceedings Article•DOI•

Objects2action: Classifying and Localizing Actions without Any Video Example

[...]

Mihir Jain¹, Jan C. van Gemert², Thomas Mensink², Cees G. M. Snoek²•Institutions (2)

French Institute for Research in Computer Science and Automation¹, University of Amsterdam²

07 Dec 2015

TL;DR: Objects2action is a semantic word embedding that is spanned by a skip-gram model of thousands of object categories that proposes a mechanism to exploit multiple-word descriptions of actions and objects and demonstrates how to extend the zero-shot approach to the spatio-temporal localization of actions in video.

...read moreread less

Abstract: The goal of this paper is to recognize actions in video without the need for examples. Different from traditional zero-shot approaches we do not demand the design and specification of attribute classifiers and class-to-attribute mappings to allow for transfer from seen classes to unseen classes. Our key contribution is objects2action, a semantic word embedding that is spanned by a skip-gram model of thousands of object categories. Action labels are assigned to an object encoding of unseen video based on a convex combination of action and object affinities. Our semantic embedding has three main characteristics to accommodate for the specifics of actions. First, we propose a mechanism to exploit multiple-word descriptions of actions and objects. Second, we incorporate the automated selection of the most responsive objects per action. And finally, we demonstrate how to extend our zero-shot approach to the spatio-temporal localization of actions in video. Experiments on four action datasets demonstrate the potential of our approach.

...read moreread less

Proceedings Article•DOI•

Simple task-specific bilingual word embeddings

[...]

Stephan Gouws¹, Anders Søgaard²•Institutions (2)

Stellenbosch University¹, University of Copenhagen²

01 Jan 2015

TL;DR: A simple wrapper method that uses off-the-shelf word embedding algorithms to learn task-specific bilingual word embeddings that is independent of the choice of embedding algorithm, does not require parallel data, and can be adapted to specific tasks by re-defining the equivalence classes.

...read moreread less

Abstract: We introduce a simple wrapper method that uses off-the-shelf word embedding algorithms to learn task-specific bilingual word embeddings. We use a small dictionary of easily-obtainable task-specific word equivalence classes to produce mixed context-target pairs that we use to train off-the-shelf embedding models. Our model has the advantage that it (a) is independent of the choice of embedding algorithm, (b) does not require parallel data, and (c) can be adapted to specific tasks by re-defining the equivalence classes. We show how our method outperforms off-the-shelf bilingual embeddings on the task of unsupervised cross-language partof-speech (POS) tagging, as well as on the task of semi-supervised cross-language super sense (SuS) tagging.

...read moreread less

Proceedings Article•DOI•

Automatic Image Annotation using Deep Learning Representations

[...]

Venkatesh N. Murthy¹, Subhransu Maji¹, R. Manmatha¹•Institutions (1)

University of Massachusetts Amherst¹

22 Jun 2015

TL;DR: It is demonstrated that word embedding vectors perform better than binary vectors as a representation of the tags associated with an image and the CCA model is compared to a simple CNN based linear regression model, which allows the CNN layers to be trained using back-propagation.

...read moreread less

Abstract: We propose simple and effective models for the image annotation that make use of Convolutional Neural Network (CNN) features extracted from an image and word embedding vectors to represent their associated tags. Our first set of models is based on the Canonical Correlation Analysis (CCA) framework that helps in modeling both views - visual features (CNN feature) and textual features (word embedding vectors) of the data. Results on all three variants of the CCA models, namely linear CCA, kernel CCA and CCA with k-nearest neighbor (CCA-KNN) clustering, are reported. The best results are obtained using CCA-KNN which outperforms previous results on the Corel-5k and the ESP-Game datasets and achieves comparable results on the IAPRTC-12 dataset. In our experiments we evaluate CNN features in the existing models which bring out the advantages of it over dozens of handcrafted features. We also demonstrate that word embedding vectors perform better than binary vectors as a representation of the tags associated with an image. In addition we compare the CCA model to a simple CNN based linear regression model, which allows the CNN layers to be trained using back-propagation.

...read moreread less

Proceedings Article•

Word embedding revisited: a new representation learning and explicit matrix factorization perspective

[...]

Yitan Li¹, Linli Xu¹, Fei Tian¹, Liang Jiang¹, Xiaowei Zhong¹, Enhong Chen¹ - Show less +2 more•Institutions (1)

University of Science and Technology of China¹

25 Jul 2015

TL;DR: It is pointed out that SGNS is essentially a representation learning method, which learns to represent the co-occurrence vector for a word, and that extended supervised word embedding can be established based on the proposed representation learning view.

...read moreread less

Abstract: Recently significant advances have been witnessed in the area of distributed word representations based on neural networks, which are also known as word embeddings. Among the new word embedding models, skip-gram negative sampling (SGNS) in the word2vec toolbox has attracted much attention due to its simplicity and effectiveness. However, the principles of SGNS remain not well understood, except for a recent work that explains SGNS as an implicit matrix factorization of the pointwise mutual information (PMI) matrix. In this paper, we provide a new perspective for further understanding SGNS. We point out that SGNS is essentially a representation learning method, which learns to represent the co-occurrence vector for a word. Based on the representation learning view, SGNS is in fact an explicit matrix factorization (EMF) of the words' co-occurrence matrix. Furthermore, extended supervised word embedding can be established based on our proposed representation learning view.

...read moreread less

Proceedings Article•DOI•

A Simple Word Embedding Model for Lexical Substitution

[...]

Oren Melamud¹, Omer Levy¹, Ido Dagan¹•Institutions (1)

Bar-Ilan University¹

01 Jun 2015

TL;DR: A simple model for lexical substitution, based on the popular skip-gram word embedding model, which is efficient, very simple to implement, and at the same time achieves state-ofthe-art results in an unsupervised setting.

...read moreread less

Abstract: The lexical substitution task requires identifying meaning-preserving substitutes for a target word instance in a given sentential context. Since its introduction in SemEval-2007, various models addressed this challenge, mostly in an unsupervised setting. In this work we propose a simple model for lexical substitution, which is based on the popular skip-gram word embedding model. The novelty of our approach is in leveraging explicitly the context embeddings generated within the skip-gram model, which were so far considered only as an internal component of the learning process. Our model is efficient, very simple to implement, and at the same time achieves state-ofthe-art results on lexical substitution tasks in an unsupervised setting.

...read moreread less

Proceedings Article•DOI•

Integrating and Evaluating Neural Word Embeddings in Information Retrieval

[...]

Guido Zuccon¹, Bevan Koopman¹, Peter Bruza¹, Leif Azzopardi²•Institutions (2)

Queensland University of Technology¹, University of Glasgow²

08 Dec 2015

TL;DR: This paper used neural word embeddings within the well known translation language model for information retrieval, which captures implicit semantic relations between the words in queries and those in relevant documents, thus producing more accurate estimations of document relevance.

...read moreread less

Abstract: Recent advances in neural language models have contributed new methods for learning distributed vector representations of words (also called word embeddings). Two such methods are the continuous bag-of-words model and the skipgram model. These methods have been shown to produce embeddings that capture higher order relationships between words that are highly effective in natural language processing tasks involving the use of word similarity and word analogy. Despite these promising results, there has been little analysis of the use of these word embeddings for retrieval. Motivated by these observations, in this paper, we set out to determine how these word embeddings can be used within a retrieval model and what the benefit might be. To this aim, we use neural word embeddings within the well known translation language model for information retrieval. This language model captures implicit semantic relations between the words in queries and those in relevant documents, thus producing more accurate estimations of document relevance. The word embeddings used to estimate neural language models produce translations that differ from previous translation language model approaches; differences that deliver improvements in retrieval effectiveness. The models are robust to choices made in building word embeddings and, even more so, our results show that embeddings do not even need to be produced from the same corpus being used for retrieval.

...read moreread less

Proceedings Article•DOI•

Word Embedding-based Antonym Detection using Thesauri and Distributional Information

[...]

Masataka Ono, Makoto Miwa¹, Yutaka Sasaki¹•Institutions (1)

Toyota Technological Institute¹

01 Jan 2015

TL;DR: This paper proposes a novel approach to train word embeddings to capture antonyms by utilizing supervised synonym and antonym information from thesauri, as well as distributional information from large-scale unlabelled text data.

...read moreread less

Abstract: This paper proposes a novel approach to train word embeddings to capture antonyms. Word embeddings have shown to capture synonyms and analogies. Such word embeddings, however, cannot capture antonyms since they depend on the distributional hypothesis. Our approach utilizes supervised synonym and antonym information from thesauri, as well as distributional information from large-scale unlabelled text data. The evaluation results on the GRE antonym question task show that our model outperforms the state-of-the-art systems and it can answer the antonym questions in the F-score of 89%.

...read moreread less

Proceedings Article•DOI•

A Word Embedding Approach to Predicting the Compositionality of Multiword Expressions

[...]

Bahar Salehi¹, Paul Cook², Timothy Baldwin¹•Institutions (2)

University of Melbourne¹, University of New Brunswick²

01 Jan 2015

TL;DR: Experimental results show that, in combination with a back-off method based on string similarity, word embeddings outperform a method using count-based distributional similarity.

...read moreread less

Abstract: This paper presents the first attempt to use word embeddings to predict the compositionality of multiword expressions. We consider both single- and multi-prototype word embeddings. Experimental results show that, in combination with a back-off method based on string similarity, word embeddings outperform a method using count-based distributional similarity. Our best results are competitive with, or superior to, state-of-the-art methods over three standard compositionality datasets, which include two types of multiword expressions and two languages.

...read moreread less

Posted Content•

Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Recurrent Neural Network

[...]

Peilu Wang, Yao Qian, Frank K. Soong, Lei He, Hai Zhao - Show less +1 more

21 Oct 2015-arXiv: Computation and Language

TL;DR: This study proposes to use BLSTM-RNN with word embedding for part-of-speech (POS) tagging task and can also achieve a good performance comparable with the Stanford POS tagger.

...read moreread less

Abstract: Bidirectional Long Short-Term Memory Recurrent Neural Network (BLSTM-RNN) has been shown to be very effective for tagging sequential data, e.g. speech utterances or handwritten documents. While word embedding has been demoed as a powerful representation for characterizing the statistical properties of natural language. In this study, we propose to use BLSTM-RNN with word embedding for part-of-speech (POS) tagging task. When tested on Penn Treebank WSJ test set, a state-of-the-art performance of 97.40 tagging accuracy is achieved. Without using morphological features, this approach can also achieve a good performance comparable with the Stanford POS tagger.

...read moreread less

Proceedings Article•DOI•

Evaluating distributed word representations for capturing semantics of biomedical concepts

[...]

Muneeb Th, Sunil Kumar Sahu, Ashish Anand¹•Institutions (1)

Indian Institute of Technology Guwahati¹

01 Jul 2015

TL;DR: This work aims to compare the performance of two state-of-the-art word embedding methods, namely word2vec and GloVe on a basic task of reflecting semantic similarity and relatedness of biomedical concepts.

...read moreread less

Abstract: Recently there is a surge in interest in learning vector representations of words using huge corpus in unsupervised manner. Such word vector representations, also known as word embedding, have been shown to improve the performance of machine learning models in several NLP tasks. However efficiency of such representation has not been systematically evaluated in biomedical domain. In this work our aim is to compare the performance of two state-of-the-art word embedding methods, namely word2vec and GloVe on a basic task of reflecting semantic similarity and relatedness of biomedical concepts. For this, vector representations of all unique words in the corpus of more than 1 million full-length research articles in biomedical domain are obtained from the two methods. These word vectors are evaluated for their ability to reflect semantic similarity and semantic relatedness of word-pairs in a benchmark data set of manually curated semantic similar and related words available at http:// rxinformatics.umn.edu. We observe that parameters of these models do affect their ability to capture lexicosemantic properties and word2vec with particular language modeling seems to perform better than others.

...read moreread less

Proceedings Article•

Metadata Embeddings for User and Item Cold-start Recommendations

[...]

Maciej Kula

30 Jul 2015

TL;DR: A hybrid matrix factorisation model representing users and items as linear combinations of their content features' latent factors outperforms both collaborative and content-based models in cold-start or sparse interaction data scenarios, and performs at least as well as a pure collaborative matrix factorsisation model where interaction data is abundant.

...read moreread less

Abstract: I present a hybrid matrix factorisation model representing users and items as linear combinations of their content features’ latent factors. The model outperforms both collaborative and content-based models in cold-start or sparse interaction data scenarios (using both user and item metadata), and performs at least as well as a pure collaborative matrix factorisation model where interaction data is abundant. Additionally, feature embeddings produced by the model encode semantic information in a way reminiscent of word embedding approaches, making them useful for a range of related tasks such as tag recommendations.

...read moreread less

Posted Content•

A Unified Tagging Solution: Bidirectional LSTM Recurrent Neural Network with Word Embedding

[...]

Peilu Wang, Yao Qian, Frank K. Soong, Lei He, Hai Zhao - Show less +1 more

01 Nov 2015-arXiv: Computation and Language

TL;DR: This work proposes to use BLSTM-RNN for a unified tagging solution that can be applied to various tagging tasks including part-of-speech tagging, chunking and named entity recognition, requiring no task specific knowledge or sophisticated feature engineering.

...read moreread less

Abstract: Bidirectional Long Short-Term Memory Recurrent Neural Network (BLSTM-RNN) has been shown to be very effective for modeling and predicting sequential data, e.g. speech utterances or handwritten documents. In this study, we propose to use BLSTM-RNN for a unified tagging solution that can be applied to various tagging tasks including part-of-speech tagging, chunking and named entity recognition. Instead of exploiting specific features carefully optimized for each task, our solution only uses one set of task-independent features and internal representations learnt from unlabeled text for all tasks.Requiring no task specific knowledge or sophisticated feature engineering, our approach gets nearly state-of-the-art performance in all these three tagging tasks.

...read moreread less

Proceedings Article•

A Study of Neural Word Embeddings for Named Entity Recognition in Clinical Text.

[...]

Yonghui Wu¹, Jun Xu¹, Min Jiang¹, Yaoyun Zhang¹, Hua Xu¹ - Show less +1 more•Institutions (1)

University of Texas Health Science Center at Houston¹

05 Nov 2015

TL;DR: The results from both 2010 i2b2 and 2014 Semantic Evaluation data showed that the binarized word embedding features outperformed other strategies for deriving distributed word representations and can be adapted to any other clinical natural language processing research.

...read moreread less

Abstract: Clinical Named Entity Recognition (NER) is a critical task for extracting important patient information from clinical text to support clinical and translational research. This study explored the neural word embeddings derived from a large unlabeled clinical corpus for clinical NER. We systematically compared two neural word embedding algorithms and three different strategies for deriving distributed word representations. Two neural word embeddings were derived from the unlabeled Multiparameter Intelligent Monitoring in Intensive Care (MIMIC) II corpus (403,871 notes). The results from both 2010 i2b2 and 2014 Semantic Evaluation (SemEval) data showed that the binarized word embedding features outperformed other strategies for deriving distributed word representations. The binarized embedding features improved the F1-score of the Conditional Random Fields based clinical NER system by 2.3% on i2b2 data and 2.4% on SemEval data. The combined feature from the binarized embeddings and the Brown clusters improved the F1-score of the clinical NER system by 2.9% on i2b2 data and 2.7% on SemEval data. Our study also showed that the distributed word embedding features derived from a large unlabeled corpus can be better than the widely used Brown clusters. Further analysis found that the neural word embeddings captured a wide range of semantic relations, which could be discretized into distributed word representations to benefit the clinical NER system. The low-cost distributed feature representation can be adapted to any other clinical natural language processing research.

...read moreread less

Proceedings Article•DOI•

Clinical Abbreviation Disambiguation Using Neural Word Embeddings

[...]

Yonghui Wu, Jun Xu, Yaoyun Zhang, Hua Xu¹•Institutions (1)

Texas Medical Center¹

01 Jul 2015

TL;DR: Evaluation using the clinical abbreviation datasets from both the Vanderbilt University and the University of Minnesota showed that neural word embedding features improved the performance of the SVMbasedclinical abbreviation disambiguation system.

...read moreread less

Abstract: This study examined the use of neural word embeddings for clinical abbreviation disambiguation, a special case of word sense disambiguation (WSD). We investigated three different methods for deriving word embeddings from a large unlabeled clinical corpus: one existing method called Surrounding based embedding feature (SBE), and two newly developed methods: Left-Right surrounding based embedding feature (LR_SBE) and MAX surrounding based embedding feature (MAX_SBE). We then added these word embeddings as additional features to a Support Vector Machines (SVM) based WSD system. Evaluation using the clinical abbreviation datasets from both the Vanderbilt University and the University of Minnesota showed that neural word embedding features improved the performance of the SVMbased clinical abbreviation disambiguation system. More specifically, the new MAX_SBE method outperformed the other two methods and achieved the state-of-the-art performance on both clinical abbreviation datasets.

...read moreread less

Journal Article•DOI•

Word Embedding Composition for Data Imbalances in Sentiment and Emotion Classification

[...]

Ruifeng Xu¹, Tao Chen¹, Yunqing Xia², Qin Lu³, Bin Liu¹, Xuan Wang¹ - Show less +2 more•Institutions (3)

Harbin Institute of Technology Shenzhen Graduate School¹, Tsinghua University², Hong Kong Polytechnic University³

03 Feb 2015-Cognitive Computation

TL;DR: This paper presents an oversampling method based on word embedding compositionality which produces meaningful balanced training data and achieves improved results for both sentiment and emotion classification.

...read moreread less

Abstract: Text classification often faces the problem of imbalanced training data. This is true in sentiment analysis and particularly prominent in emotion classification where multiple emotion categories are very likely to produce naturally skewed training data. Different sampling methods have been proposed to improve classification performance by reducing the imbalance ratio between training classes. However, data sparseness and the small disjunct problem remain obstacles in generating new samples for minority classes when the data are skewed and limited. Methods to produce meaningful samples for smaller classes rather than simple duplication are essential in overcoming this problem. In this paper, we present an oversampling method based on word embedding compositionality which produces meaningful balanced training data. We first use a large corpus to train a continuous skip-gram model to form a word embedding model maintaining the syntactic and semantic integrity of the word features. Then, a compositional algorithm based on recursive neural tensor networks is used to construct sentence vectors based on the word embedding model. Finally, we use the SMOTE algorithm as an oversampling method to generate samples for the minority classes and produce a fully balanced training set. Evaluation results on two quite different tasks show that the feature composition method and the oversampling method are both important in obtaining improved classification results. Our method effectively addresses the data imbalance issue and consequently achieves improved results for both sentiment and emotion classification.

...read moreread less

Proceedings Article•

Representation learning for aspect category detection in online reviews

[...]

Xinjie Zhou¹, Xiaojun Wan¹, Jianguo Xiao¹•Institutions (1)

Peking University¹

25 Jan 2015

TL;DR: This paper proposes a representation learning approach to automatically learn useful features for aspect category detection and achieves the state-of-the-art performance and outperforms the best participating team as well as a few strong baselines.

...read moreread less

Abstract: User-generated reviews are valuable resources for decision making. Identifying the aspect categories discussed in a given review sentence (e.g., "food" and "service" in restaurant reviews) is an important task of sentiment analysis and opinion mining. Given a predefined aspect category set, most previous researches leverage handcrafted features and a classification algorithm to accomplish the task. The crucial step to achieve better performance is feature engineering which consumes much human effort and may be unstable when the product domain changes. In this paper, we propose a representation learning approach to automatically learn useful features for aspect category detection. Specifically, a semi-supervised word embedding algorithm is first proposed to obtain continuous word representations on a large set of reviews with noisy labels. Afterwards, we propose to generate deeper and hybrid features through neural networks stacked on the word vectors. A logistic regression classifier is finally trained with the hybrid features to predict the aspect category. The experiments are carried out on a benchmark dataset released by SemEval-2014. Our approach achieves the state-of-the-art performance and outperforms the best participating team as well as a few strong baselines.

...read moreread less

Proceedings Article•DOI•

Whatâ€™s in an Embedding? Analyzing Word Embeddings through Multilingual Evaluation

[...]

Arne Köhn¹•Institutions (1)

University of Hamburg¹

01 Sep 2015

TL;DR: It is shown that all embedding approaches behave similarly in this task, with dependency-based embeddings performing best and this effect is even more pronounced when generating low dimensionalembeddings.

...read moreread less

Abstract: In the last two years, there has been a surge of word embedding algorithms and research on them. However, evaluation has mostly been carried out on a narrow set of tasks, mainly word similarity/relatedness and word relation similarity and on a single language, namely English. We propose an approach to evaluate embeddings on a variety of languages that also yields insights into the structure of the embedding space by investigating how well word embeddings cluster along different syntactic features. We show that all embedding approaches behave similarly in this task, with dependency-based embeddings performing best. This effect is even more pronounced when generating low dimensional embeddings.

...read moreread less