Incorporating Metadata into Content-Based User Embeddings.
Linzi Xing,Michael J. Paul +1 more
- pp 45-49
Reads0
Chats0
TLDR
This work proposes a data augmentation method that allows novel feature types to be used within off-the-shelf embedding models, and shows that this approach can lead to substantial performance gains with the simple addition of network and geographic features.Abstract:
Low-dimensional vector representations of social media users can benefit applications like recommendation systems and user attribute inference Recent work has shown that user embeddings can be improved by combining different types of information, such as text and network data We propose a data augmentation method that allows novel feature types to be used within off-the-shelf embedding models Experimenting with the task of friend recommendation on a dataset of 5,019 Twitter users, we show that our approach can lead to substantial performance gains with the simple addition of network and geographic featuresread more
Citations
More filters
Book
Actes de la conférence Traitement Automatique de la Langue Naturelle, TALN 2018
Anne-Laure Ligozat,Peggy Cellier,Anne-Lyse Minard,Vincent Claveau,Cyril Grouin,Patrick Paroubek +5 more
TL;DR: This article presents an information extraction method which collects additional information on the web so as to enrich already existing information and then fill in a knowledge base using lexical and syntactical patterns.
Proceedings Article
RP-DNN : a Tweet level propagation context based deep neural networks for early rumor detection in social media
TL;DR: The authors proposed a novel hybrid neural network architecture, which combines a task-specific character-based bidirectional language model and stacked Long Short-Term Memory (LSTM) networks to represent textual contents and social-temporal contexts of input source tweets for modeling propagation patterns of rumors in the early stages of their development.
Proceedings ArticleDOI
Party Matters: Enhancing Legislative Embeddings with Author Attributes for Vote Prediction
TL;DR: This article proposed a novel neural method for encoding documents alongside additional metadata, achieving an average of a 4% boost in accuracy over the previous state-of-the-art state of the art.
Proceedings ArticleDOI
Detecting Trending Terms in Cybersecurity Forum Discussions.
TL;DR: This work presents a lightweight method for identifying currently trending terms in relation to a known prior of terms, using a weighted log-odds ratio with an informative prior, and finds this method outperforms TF-IDF on information retrieval.
Posted Content
RP-DNN: A Tweet level propagation context based deep neural networks for early rumor detection in Social Media
TL;DR: A novel hybrid neural network architecture is presented, which combines a task-specific character-based bidirectional language model and stacked Long Short-Term Memory networks to represent textual contents and social-temporal contexts of input source tweets, for modelling propagation patterns of rumors in the early stages of their development.
References
More filters
Proceedings Article
Distributed Representations of Words and Phrases and their Compositionality
TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Proceedings Article
Distributed Representations of Sentences and Documents
Quoc V. Le,Tomas Mikolov +1 more
TL;DR: Paragraph Vector is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and its construction gives the algorithm the potential to overcome the weaknesses of bag-of-words models.
Journal IssueDOI
The link-prediction problem for social networks
David Liben-Nowell,Jon Kleinberg +1 more
TL;DR: Experiments on large coauthorship networks suggest that information about future interactions can be extracted from network topology alone, and that fairly subtle measures for detecting node proximity can outperform more direct measures.
Software Framework for Topic Modelling with Large Corpora
Radim Řehůřek,Petr Sojka +1 more
TL;DR: This work describes a Natural Language Processing software framework which is based on the idea of document streaming, i.e. processing corpora document after document, in a memory independent fashion, and implements several popular algorithms for topical inference, including Latent Semantic Analysis and Latent Dirichlet Allocation in a way that makes them completely independent of the training corpus size.
Posted Content
Distributed Representations of Sentences and Documents
Quoc V. Le,Tomas Mikolov +1 more
TL;DR: The authors proposed paragraph vector, an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and achieved new state-of-the-art results on several text classification and sentiment analysis tasks.