scispace - formally typeset
Open AccessProceedings ArticleDOI

Incorporating Metadata into Content-Based User Embeddings.

Reads0
Chats0
TLDR
This work proposes a data augmentation method that allows novel feature types to be used within off-the-shelf embedding models, and shows that this approach can lead to substantial performance gains with the simple addition of network and geographic features.
Abstract
Low-dimensional vector representations of social media users can benefit applications like recommendation systems and user attribute inference Recent work has shown that user embeddings can be improved by combining different types of information, such as text and network data We propose a data augmentation method that allows novel feature types to be used within off-the-shelf embedding models Experimenting with the task of friend recommendation on a dataset of 5,019 Twitter users, we show that our approach can lead to substantial performance gains with the simple addition of network and geographic features

read more

Content maybe subject to copyright    Report

Citations
More filters
Book

Actes de la conférence Traitement Automatique de la Langue Naturelle, TALN 2018

TL;DR: This article presents an information extraction method which collects additional information on the web so as to enrich already existing information and then fill in a knowledge base using lexical and syntactical patterns.
Proceedings Article

RP-DNN : a Tweet level propagation context based deep neural networks for early rumor detection in social media

TL;DR: The authors proposed a novel hybrid neural network architecture, which combines a task-specific character-based bidirectional language model and stacked Long Short-Term Memory (LSTM) networks to represent textual contents and social-temporal contexts of input source tweets for modeling propagation patterns of rumors in the early stages of their development.
Proceedings ArticleDOI

Party Matters: Enhancing Legislative Embeddings with Author Attributes for Vote Prediction

TL;DR: This article proposed a novel neural method for encoding documents alongside additional metadata, achieving an average of a 4% boost in accuracy over the previous state-of-the-art state of the art.
Proceedings ArticleDOI

Detecting Trending Terms in Cybersecurity Forum Discussions.

TL;DR: This work presents a lightweight method for identifying currently trending terms in relation to a known prior of terms, using a weighted log-odds ratio with an informative prior, and finds this method outperforms TF-IDF on information retrieval.
Posted Content

RP-DNN: A Tweet level propagation context based deep neural networks for early rumor detection in Social Media

TL;DR: A novel hybrid neural network architecture is presented, which combines a task-specific character-based bidirectional language model and stacked Long Short-Term Memory networks to represent textual contents and social-temporal contexts of input source tweets, for modelling propagation patterns of rumors in the early stages of their development.
References
More filters
Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Proceedings Article

Distributed Representations of Sentences and Documents

TL;DR: Paragraph Vector is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and its construction gives the algorithm the potential to overcome the weaknesses of bag-of-words models.
Journal IssueDOI

The link-prediction problem for social networks

TL;DR: Experiments on large coauthorship networks suggest that information about future interactions can be extracted from network topology alone, and that fairly subtle measures for detecting node proximity can outperform more direct measures.

Software Framework for Topic Modelling with Large Corpora

TL;DR: This work describes a Natural Language Processing software framework which is based on the idea of document streaming, i.e. processing corpora document after document, in a memory independent fashion, and implements several popular algorithms for topical inference, including Latent Semantic Analysis and Latent Dirichlet Allocation in a way that makes them completely independent of the training corpus size.
Posted Content

Distributed Representations of Sentences and Documents

TL;DR: The authors proposed paragraph vector, an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and achieved new state-of-the-art results on several text classification and sentiment analysis tasks.
Related Papers (5)