scispace - formally typeset
Search or ask a question
Topic

Word embedding

About: Word embedding is a research topic. Over the lifetime, 4683 publications have been published within this topic receiving 153378 citations. The topic is also known as: word embeddings.


Papers
More filters
Journal ArticleDOI
TL;DR: In this article, the authors combine word representations and deep visual features in a globally trainable deep convolutional neural network for fine-grained image classification, where the attention mechanism is adopted to compute the relevance between each recognized word and the given image.
Abstract: Text in natural images contains rich semantics that is often highly relevant to objects or scene. In this paper, we focus on the problem of fully exploiting scene text for visual understanding. The main idea is combining word representations and deep visual features in a globally trainable deep convolutional neural network. First, the recognized words are obtained by a scene text reading system. Next, we combine the word embedding of the recognized words and the deep visual features into a single representation that is optimized by a convolutional neural network for fine-grained image classification. In our framework, the attention mechanism is adopted to compute the relevance between each recognized word and the given image, which further enhances the recognition performance. We have performed experiments on two datasets: con-text dataset and drink bottle dataset, which are proposed for fine-grained classification of business places and drink bottles, respectively. The experimental results consistently demonstrate that the proposed method of combining textual and visual cues significantly outperforms classification with only visual representation. Moreover, we have shown that the learned representation improves the retrieval performance on the drink bottle images by a large margin, making it potentially powerful in product search.

53 citations

Proceedings ArticleDOI
01 Jul 2015
TL;DR: This paper presents the embedding models that achieve an F-score of 92% on the widely-used, publicly available dataset, the GRE “most contrasting word” questions, and examines several basic concerns in modeling contrasting meaning to provide detailed analysis.
Abstract: Contrasting meaning is a basic aspect of semantics. Recent word-embedding models based on distributional semantics hypothesis are known to be weak for modeling lexical contrast. We present in this paper the embedding models that achieve an F-score of 92% on the widely-used, publicly available dataset, the GRE “most contrasting word” questions (Mohammad et al., 2008). This is the highest performance seen so far on this dataset. Surprisingly at the first glance, unlike what was suggested in most previous work, where relatedness statistics learned from corpora is claimed to yield extra gains over lexicon-based models, we obtained our best result relying solely on lexical resources (Roget’s and WordNet)—corpora statistics did not lead to further improvement. However, this should not be simply taken as that distributional statistics is not useful. We examine several basic concerns in modeling contrasting meaning to provide detailed analysis, with the aim to shed some light on the future directions for this basic semantics modeling problem.

53 citations

Proceedings ArticleDOI
06 Nov 2017
TL;DR: A deep node embedding method called IGE (Interaction Graph Embedding) is proposed, which can be specifically designed for different datasets as long as it is differentiable, in which case it can be trained together with prediction networks by back-propagation.
Abstract: Node embedding techniques have gained prominence since they produce continuous and low-dimensional features, which are effective for various tasks. Most existing approaches learn node embeddings by exploring the structure of networks and are mainly focused on static non-attributed graphs. However, many real-world applications, such as stock markets and public review websites, involve bipartite graphs with dynamic and attributed edges, called attributed interaction graphs. Different from conventional graph data, attributed interaction graphs involve two kinds of entities (e.g. investors/stocks and users/businesses) and edges of temporal interactions with attributes (e.g. transactions and reviews). In this paper, we study the problem of node embedding in attributed interaction graphs. Learning embeddings in interaction graphs is highly challenging due to the dynamics and heterogeneous attributes of edges. Different from conventional static graphs, in attributed interaction graphs, each edge can have totally different meanings when the interaction is at different times or associated with different attributes. We propose a deep node embedding method called IGE (Interaction Graph Embedding). IGE is composed of three neural networks: an encoding network is proposed to transform attributes into a fixed-length vector to deal with the heterogeneity of attributes; then encoded attribute vectors interact with nodes multiplicatively in two coupled prediction networks that investigate the temporal dependency by treating incident edges of a node as the analogy of a sentence in word embedding methods. The encoding network can be specifically designed for different datasets as long as it is differentiable, in which case it can be trained together with prediction networks by back-propagation. We evaluate our proposed method and various comparing methods on four real-world datasets. The experimental results prove the effectiveness of the learned embeddings by IGE on both node clustering and classification tasks.

53 citations

Journal ArticleDOI
TL;DR: A new multimodal deep learning framework for event detection from videos by leveraging recent advances in deep neural networks and a novel fusion technique is proposed that integrates different data representations in two levels, namely frame-level and video-level.
Abstract: Real-world applications usually encounter data with various modalities, each containing valuable information. To enhance these applications, it is essential to effectively analyze all information extracted from different data modalities, while most existing learning models ignore some data types and only focus on a single modality. This paper presents a new multimodal deep learning framework for event detection from videos by leveraging recent advances in deep neural networks. First, several deep learning models are utilized to extract useful information from multiple modalities. Among these are pre-trained Convolutional Neural Networks (CNNs) for visual and audio feature extraction and a word embedding model for textual analysis. Then, a novel fusion technique is proposed that integrates different data representations in two levels, namely frame-level and video-level. Different from the existing multimodal learning algorithms, the proposed framework can reason about a missing data type using other available data modalities. The proposed framework is applied to a new video dataset containing natural disaster classes. The experimental results illustrate the effectiveness of the proposed framework compared to some single modal deep learning models as well as conventional fusion techniques. Specifically, the final accuracy is improved more than 16% and 7% compared to the best results from single modality and fusion models, respectively.

53 citations

Posted Content
TL;DR: CODER embeddings excellently reflect semantic similarity and relatedness of medical concepts and can be used for embedding-based medical term normalization or to provide features for machine learning.
Abstract: This paper proposes CODER: contrastive learning on knowledge graphs for cross-lingual medical term representation. CODER is designed for medical term normalization by providing close vector representations for different terms that represent the same or similar medical concepts with cross-lingual support. We train CODER via contrastive learning on a medical knowledge graph (KG) named the Unified Medical Language System, where similarities are calculated utilizing both terms and relation triplets from KG. Training with relations injects medical knowledge into embeddings and aims to provide potentially better machine learning features. We evaluate CODER in zero-shot term normalization, semantic similarity, and relation classification benchmarks, which show that CODERoutperforms various state-of-the-art biomedical word embedding, concept embeddings, and contextual embeddings. Our codes and models are available at this https URL.

53 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
87% related
Unsupervised learning
22.7K papers, 1M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Reinforcement learning
46K papers, 1M citations
84% related
Graph (abstract data type)
69.9K papers, 1.2M citations
84% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023317
2022716
2021736
20201,025
20191,078
2018788