scispace - formally typeset
Search or ask a question
Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.


Papers
More filters
Posted Content
TL;DR: Investigation of style and topic aspects of language in online communities shows that style is a better indicator of community identity than topic, even for communities organized around specific topics.
Abstract: This work investigates style and topic aspects of language in online communities: looking at both utility as an identifier of the community and correlation with community reception of content. Style is characterized using a hybrid word and part-of-speech tag n-gram language model, while topic is represented using Latent Dirichlet Allocation. Experiments with several Reddit forums show that style is a better indicator of community identity than topic, even for communities organized around specific topics. Further, there is a positive correlation between the community reception to a contribution and the style similarity to that community, but not so for topic similarity.

30 citations

Proceedings ArticleDOI
27 Jun 2014
TL;DR: This paper identifies the "hidden" patterns of operations conducted by both normal users and malicious users from a large volume of network/systems logs by mapping this problem to the topic modeling problem and leveraging the well established LDA models and learning algorithms.
Abstract: This paper explores a hybrid approach of intrusion detection through knowledge discovery from big data using Latent Dirichlet Allocation (LDA). We identify the "hidden" patterns of operations conducted by both normal users and malicious users from a large volume of network/systems logs, by mapping this problem to the topic modeling problem and leveraging the well established LDA models and learning algorithms. This new approach potentially completes the strength of signature-based and anomaly-based methods.

30 citations

Journal ArticleDOI
01 Jul 2017
TL;DR: An attempt to track online discussions geographically over time and it was shown that the locations correlated with the actual election locations, and that the model provides a better geolocation classification compared to using a keyword-based approach.
Abstract: Tracking how discussion topics evolve in social media and where these topics are discussed geographically over time has the potential to provide useful information for many different purposes In crisis management, knowing a specific topic's current geographical location could provide vital information to where, or even which, resources should be allocated This paper describes an attempt to track online discussions geographically over time A distributed geo-aware streaming latent Dirichlet allocation model was developed for the purpose of recognizing topics' locations in unstructured text To evaluate the model it has been implemented and used for automatic discovery and geographical tracking of election topics during parts of the 2016 American presidential primary elections It was shown that the locations correlated with the actual election locations, and that the model provides a better geolocation classification compared to using a keyword-based approach

29 citations

Journal ArticleDOI
TL;DR: By quantifying the temporal trends of topics at the country/area level, it is found that researchers in different regions tend to focus on different sub-fields, while adjacent countries/areas commonly share research topics.

29 citations

Journal ArticleDOI
01 Feb 2017
TL;DR: The motivation is to assess the effectiveness of support vector networks (SVN) on the task of detecting deception in texts, as well as to investigate to which degree it is possible to build a domain-independent detector of deception in text using SVN.
Abstract: Our motivation is to assess the effectiveness of support vector networks (SVN) on the task of detecting deception in texts, as well as to investigate to which degree it is possible to build a domain-independent detector of deception in text using SVN. We experimented with different feature sets for training the SVN: a continuous semantic space model source represented by the latent Dirichlet allocation topics, a word-space model, and dictionary-based features. In this way, a comparison of performance between semantic information and behavioral information is made. We tested several combinations of these features on different datasets designed to identify deception. The datasets used include the DeRev dataset (a corpus of deceptive and truthful opinions about books obtained from Amazon), OpSpam (a corpus of fake and truthful opinions about hotels), and three corpora on controversial topics (abortion, death penalty, and a best friend) on which the subjects were asked to write an idea contrary to what they really believed. We experimented with one-domain setting by training and testing our models separately on each dataset (with fivefold cross-validation), with mixed-domain setting by merging all datasets into one large corpus (again, with fivefold cross-validation), and with cross-domain setting: using one dataset for testing and a concatenation of all other datasets for training. We obtained an average accuracy of 86% in one-domain setting, 75% in mixed-domain setting, and 52 to 64% in cross-domain setting.

29 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
86% related
Support vector machine
73.6K papers, 1.7M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Feature extraction
111.8K papers, 2.1M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023323
2022842
2021418
2020429
2019473
2018446