Topic

Word embedding

About: Word embedding is a research topic. Over the lifetime, 4683 publications have been published within this topic receiving 153378 citations. The topic is also known as: word embeddings.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Characterizing Departures from Linearity in Word Translation

[...]

Ndapa Nakashole¹, Raphael Flauger•Institutions (1)

University of California, San Diego¹

01 Jul 2018

TL;DR: The authors investigate the behavior of maps learned by machine translation methods and find that the underlying maps are non-linear, and that the locally linear maps vary by an amount that is tightly correlated with the distance between the neighborhoods on which they are trained.

...read moreread less

Abstract: We investigate the behavior of maps learned by machine translation methods. The maps translate words by projecting between word embedding spaces of different languages. We locally approximate these maps using linear maps, and find that they vary across the word embedding space. This demonstrates that the underlying maps are non-linear. Importantly, we show that the locally linear maps vary by an amount that is tightly correlated with the distance between the neighborhoods on which they are trained. Our results can be used to test non-linear methods, and to drive the design of more accurate maps for word translation.

...read moreread less

37 citations

Proceedings Article•DOI•

Discovering Human Interactions With Novel Objects via Zero-Shot Learning

[...]

Suchen Wang¹, Kim-Hui Yap¹, Junsong Yuan², Yap-Peng Tan¹•Institutions (2)

Nanyang Technological University¹, University at Buffalo²

14 Jun 2020

TL;DR: The core idea is to leverage human visual clues to localize objects which are interacting with humans to outperform existing methods on detecting interacting objects, and generalize well to novel objects.

...read moreread less

Abstract: We aim to detect human interactions with novel objects through zero-shot learning. Different from previous works, we allow unseen object categories by using its semantic word embedding. To do so, we design a human-object region proposal network specifically for the human-object interaction detection task. The core idea is to leverage human visual clues to localize objects which are interacting with humans. We show that our proposed model can outperform existing methods on detecting interacting objects, and generalize well to novel objects. To recognize objects from unseen categories, we devise a zero-shot classification module upon the classifier of seen categories. It utilizes the classifier logits for seen categories to estimate a vector in the semantic space, and then performs nearest search to find the closest unseen category. We validate our method on V-COCO and HICO-DET datasets, and obtain superior results on detecting human interactions with both seen and unseen objects.

...read moreread less

37 citations

Book Chapter•DOI•

SIGNet: Scalable Embeddings for Signed Networks

[...]

Mohammad Raihanul Islam¹, B. Aditya Prakash¹, Naren Ramakrishnan¹•Institutions (1)

Virginia Tech¹

03 Jun 2018

TL;DR: SIGNet as discussed by the authors is a fast scalable embedding method suitable for signed networks that builds upon the traditional word2vec family of embedding approaches and adds a targeted node sampling strategy to maintain structural balance in higher-order neighborhoods.

...read moreread less

Abstract: Recent successes in word embedding and document embedding have motivated researchers to explore similar representations for networks and to use such representations for tasks such as edge prediction, node label prediction, and community detection. Such network embedding methods are largely focused on finding distributed representations for unsigned networks and are unable to discover embeddings that respect polarities inherent in edges. We propose SIGNet, a fast scalable embedding method suitable for signed networks. Our proposed objective function aims to carefully model the social structure implicit in signed networks by reinforcing the principles of social balance theory. Our method builds upon the traditional word2vec family of embedding approaches and adds a new targeted node sampling strategy to maintain structural balance in higher-order neighborhoods. We demonstrate the superiority of SIGNet over state-of-the-art methods proposed for both signed and unsigned networks on several real world datasets from different domains. In particular, SIGNet offers an approach to generate a richer vocabulary of features of signed networks to support representation and reasoning.

...read moreread less

36 citations

Journal Article•DOI•

Knowledge of words: An interpretable approach for personality recognition from social media

[...]

Songqiao Han¹, Hailiang Huang¹, Yuqing Tang¹•Institutions (1)

Shanghai University of Finance and Economics¹

22 Apr 2020-Knowledge Based Systems

TL;DR: This paper uses word embedding techniques and prior-knowledge lexicons to automatically construct a Chinese semantic lexicon suitable for personality analysis, and analyses the correlations between personality traits and semantic categories of words to construct personality recognition models using classification algorithms.

...read moreread less

Abstract: Personality is one of the fundamental and stable individual characteristics that can be detected from human behavioral data. With the rise of social media, increasing attention has been paid to the ability to recognize personality traits by analyzing the contents of user-generated text. Existing studies have used general psychological lexicons or machine learning, and even deep learning models, to predict personality, but their performance has been relatively poor or they have lacked the ability to interpret personality. In this paper, we present a novel interpretable personality recognition model based on a personality lexicon. First, we use word embedding techniques and prior-knowledge lexicons to automatically construct a Chinese semantic lexicon suitable for personality analysis. Based on this personality lexicon, we analyze the correlations between personality traits and semantic categories of words, and extract the semantic features of users’ microblogs to construct personality recognition models using classification algorithms. Extensive experiments are conducted to demonstrate that the proposed model can achieve significantly better performances compared to previous approaches.

...read moreread less

36 citations

Proceedings Article•DOI•

SuperTML: Two-Dimensional Word Embedding for the Precognition on Structured Tabular Data

[...]

Baohua Sun, Lin Yang, Wenhan Zhang, Michael Lin, Patrick Z. Dong, Charles Young, Jason Dong - Show less +3 more

16 Jun 2019

TL;DR: The SuperTML method is proposed, which borrows the idea of Super Characters method and two-dimensional embeddings to address the problem of classification on tabular data and achieves state-of-the-art results on both large and small datasets.

...read moreread less

Abstract: Tabular data is the most commonly used form of data in industry according to a Kaggle ML and DS Survey. Gradient Boosting Trees, Support Vector Machine, Random Forest, and Logistic Regression are typically used for classification tasks on tabular data. DNN models using categorical embeddings are also applied in this task, but all attempts thus far have used one-dimensional embeddings. The recent work of Super Characters method using two-dimensional word embeddings achieved state-of-the-art results in text classification tasks, showcasing the promise of this new approach. In this paper, we propose the SuperTML method, which borrows the idea of Super Characters method and two-dimensional embeddings to address the problem of classification on tabular data. For each input of tabular data, the features are first projected into two-dimensional embeddings like an image, and then this image is fed into fine-tuned two-dimensional CNN models for classification. The proposed SuperTML method handles the categorical data and missing values in tabular data automatically, without any need to pre-process into numerical values. Comparisons of model performance are conducted on one of the largest and most active competitions on the Kaggle platform, as well as on the top three most popular data sets in the UCI Machine Learning Repository. Experimental results have shown that the proposed SuperTML method have achieved state-of-the-art results on both large and small datasets.

...read moreread less

36 citations

Collapse

Network Information

Performance

Metrics

5,718

Papers

201,647

Citations

No. of papers in the topic in previous years
Year	Papers
2023	317
2022	716
2021	736
2020	1,025
2019	1,078
2018	788

Word embedding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics