scispace - formally typeset
Search or ask a question
Topic

Word embedding

About: Word embedding is a research topic. Over the lifetime, 4683 publications have been published within this topic receiving 153378 citations. The topic is also known as: word embeddings.


Papers
More filters
Proceedings ArticleDOI
01 Jul 2018
TL;DR: The authors investigate the behavior of maps learned by machine translation methods and find that the underlying maps are non-linear, and that the locally linear maps vary by an amount that is tightly correlated with the distance between the neighborhoods on which they are trained.
Abstract: We investigate the behavior of maps learned by machine translation methods. The maps translate words by projecting between word embedding spaces of different languages. We locally approximate these maps using linear maps, and find that they vary across the word embedding space. This demonstrates that the underlying maps are non-linear. Importantly, we show that the locally linear maps vary by an amount that is tightly correlated with the distance between the neighborhoods on which they are trained. Our results can be used to test non-linear methods, and to drive the design of more accurate maps for word translation.

37 citations

Proceedings ArticleDOI
14 Jun 2020
TL;DR: The core idea is to leverage human visual clues to localize objects which are interacting with humans to outperform existing methods on detecting interacting objects, and generalize well to novel objects.
Abstract: We aim to detect human interactions with novel objects through zero-shot learning. Different from previous works, we allow unseen object categories by using its semantic word embedding. To do so, we design a human-object region proposal network specifically for the human-object interaction detection task. The core idea is to leverage human visual clues to localize objects which are interacting with humans. We show that our proposed model can outperform existing methods on detecting interacting objects, and generalize well to novel objects. To recognize objects from unseen categories, we devise a zero-shot classification module upon the classifier of seen categories. It utilizes the classifier logits for seen categories to estimate a vector in the semantic space, and then performs nearest search to find the closest unseen category. We validate our method on V-COCO and HICO-DET datasets, and obtain superior results on detecting human interactions with both seen and unseen objects.

37 citations

Book ChapterDOI
03 Jun 2018
TL;DR: SIGNet as discussed by the authors is a fast scalable embedding method suitable for signed networks that builds upon the traditional word2vec family of embedding approaches and adds a targeted node sampling strategy to maintain structural balance in higher-order neighborhoods.
Abstract: Recent successes in word embedding and document embedding have motivated researchers to explore similar representations for networks and to use such representations for tasks such as edge prediction, node label prediction, and community detection. Such network embedding methods are largely focused on finding distributed representations for unsigned networks and are unable to discover embeddings that respect polarities inherent in edges. We propose SIGNet, a fast scalable embedding method suitable for signed networks. Our proposed objective function aims to carefully model the social structure implicit in signed networks by reinforcing the principles of social balance theory. Our method builds upon the traditional word2vec family of embedding approaches and adds a new targeted node sampling strategy to maintain structural balance in higher-order neighborhoods. We demonstrate the superiority of SIGNet over state-of-the-art methods proposed for both signed and unsigned networks on several real world datasets from different domains. In particular, SIGNet offers an approach to generate a richer vocabulary of features of signed networks to support representation and reasoning.

36 citations

Journal ArticleDOI
TL;DR: This paper uses word embedding techniques and prior-knowledge lexicons to automatically construct a Chinese semantic lexicon suitable for personality analysis, and analyses the correlations between personality traits and semantic categories of words to construct personality recognition models using classification algorithms.
Abstract: Personality is one of the fundamental and stable individual characteristics that can be detected from human behavioral data. With the rise of social media, increasing attention has been paid to the ability to recognize personality traits by analyzing the contents of user-generated text. Existing studies have used general psychological lexicons or machine learning, and even deep learning models, to predict personality, but their performance has been relatively poor or they have lacked the ability to interpret personality. In this paper, we present a novel interpretable personality recognition model based on a personality lexicon. First, we use word embedding techniques and prior-knowledge lexicons to automatically construct a Chinese semantic lexicon suitable for personality analysis. Based on this personality lexicon, we analyze the correlations between personality traits and semantic categories of words, and extract the semantic features of users’ microblogs to construct personality recognition models using classification algorithms. Extensive experiments are conducted to demonstrate that the proposed model can achieve significantly better performances compared to previous approaches.

36 citations

Proceedings ArticleDOI
16 Jun 2019
TL;DR: The SuperTML method is proposed, which borrows the idea of Super Characters method and two-dimensional embeddings to address the problem of classification on tabular data and achieves state-of-the-art results on both large and small datasets.
Abstract: Tabular data is the most commonly used form of data in industry according to a Kaggle ML and DS Survey. Gradient Boosting Trees, Support Vector Machine, Random Forest, and Logistic Regression are typically used for classification tasks on tabular data. DNN models using categorical embeddings are also applied in this task, but all attempts thus far have used one-dimensional embeddings. The recent work of Super Characters method using two-dimensional word embeddings achieved state-of-the-art results in text classification tasks, showcasing the promise of this new approach. In this paper, we propose the SuperTML method, which borrows the idea of Super Characters method and two-dimensional embeddings to address the problem of classification on tabular data. For each input of tabular data, the features are first projected into two-dimensional embeddings like an image, and then this image is fed into fine-tuned two-dimensional CNN models for classification. The proposed SuperTML method handles the categorical data and missing values in tabular data automatically, without any need to pre-process into numerical values. Comparisons of model performance are conducted on one of the largest and most active competitions on the Kaggle platform, as well as on the top three most popular data sets in the UCI Machine Learning Repository. Experimental results have shown that the proposed SuperTML method have achieved state-of-the-art results on both large and small datasets.

36 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
87% related
Unsupervised learning
22.7K papers, 1M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Reinforcement learning
46K papers, 1M citations
84% related
Graph (abstract data type)
69.9K papers, 1.2M citations
84% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023317
2022716
2021736
20201,025
20191,078
2018788