scispace - formally typeset
Search or ask a question
Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.


Papers
More filters
Book
26 May 2011
TL;DR: This book presents a comprehensive treatment of visual analysis of behaviour from computational-modelling and algorithm-design perspectives.
Abstract: This book presents a comprehensive treatment of visual analysis of behaviour from computational-modelling and algorithm-design perspectives. Topics: covers learning-group activity models, unsupervised behaviour profiling, hierarchical behaviour discovery, learning behavioural context, modelling rare behaviours, and man-in-the-loop active learning; examines multi-camera behaviour correlation, person re-identification, and connecting-the-dots for abnormal behaviour detection; discusses Bayesian information criterion, Bayesian networks, bag-of-words representation, canonical correlation analysis, dynamic Bayesian networks, Gaussian mixtures, and Gibbs sampling; investigates hidden conditional random fields, hidden Markov models, human silhouette shapes, latent Dirichlet allocation, local binary patterns, locality preserving projection, and Markov processes; explores probabilistic graphical models, probabilistic topic models, space-time interest points, spectral clustering, and support vector machines.

31 citations

Proceedings ArticleDOI
01 May 2018
TL;DR: This work attempts to find out how to distinguish between fake reviews and non-fake reviews by using linguistic features in terms of Yelp Filter Dataset, and to its surprise, the linguistic features performed well.
Abstract: It is necessary for potential consume to make decision based on online reviews. However, its usefulness brings forth a curse - deceptive opinion spam. The deceptive opinion spam mislead potential customers and organizations reshaping their businesses and prevent opinion-mining techniques from reaching accurate conclusions. Thus, the detection of fake reviews has become more and more fervent. In this work, we attempt to find out how to distinguish between fake reviews and non-fake reviews by using linguistic features in terms of Yelp Filter Dataset. To our surprise, the linguistic features performed well. Further, we proposed a method to extract features based on Latent Dirichlet Allocation. The result of experiment proved that the method is effective.

31 citations

Journal ArticleDOI
TL;DR: The experimental results have shown that the extracted point-cluster features combined with the multipath classifiers can significantly enhance the classification accuracy, and they have demonstrated the superior performance of the method over other techniques in point-cloud classification.
Abstract: Efficient presentation and recognition of on-ground objects from airborne laser scanning (ALS) point clouds are a challenging task. In this paper, we propose an approach that combines a discriminative-dictionary-learning-based sparse coding and latent Dirichlet allocation (LDA) to generate multilevel point-cluster features for ALS point-cloud classification. Our method takes advantage of the labels of training data and each dictionary item to enforce discriminability in sparse coding during the dictionary learning process and more accurately further represent point-cluster features. The multipath AdaBoost classifiers with the hierarchical point-cluster features are trained, and we apply them to the classification of unknown points by the heritance of the recognition results under different paths. Experiments are performed on different ALS point clouds; the experimental results have shown that the extracted point-cluster features combined with the multipath classifiers can significantly enhance the classification accuracy, and they have demonstrated the superior performance of our method over other techniques in point-cloud classification.

31 citations

Journal ArticleDOI
TL;DR: This approach outperforms state-of-the-art phishing detection researches for an accredited data set, in applications based only on the body of the e-mails, without using other e-mail features such as its header, IP information or number of links in the text.
Abstract: Phishing is a type of fraud attempt in which the attacker, usually by e-mail, pretends to be a trusted person or entity in order to obtain sensitive information from a target. Most recent phishing detection researches have focused on obtaining highly distinctive features from the metadata and text of these e-mails. The obtained attributes are then used to feed classification algorithms in order to determine whether they are phishing or legitimate messages. In this paper, it is proposed an approach based on machine learning to detect phishing e-mail attacks. The methods that compose this approach are performed through a feature engineering process based on natural language processing, lemmatization, topics modeling, improved learning techniques for resampling and cross-validation, and hyperparameters configuration. The first proposed method uses all the features obtained from the Document-Term Matrix (DTM) in the classification algorithms. The second one uses Latent Dirichlet Allocation (LDA) as a operation to deal with the problems of the “curse of dimensionality”, the sparsity, and the text context portion included in the obtained representation. The proposed approach reached marks with an F1-measure of 99.95% success rate using the XGBoost algorithm. It outperforms state-of-the-art phishing detection researches for an accredited data set, in applications based only on the body of the e-mails, without using other e-mail features such as its header, IP information or number of links in the text.

31 citations

Journal ArticleDOI
TL;DR: A semantic text analytics approach that can dig out the customers’ most basic concerns about their online purchase choices, based on the hypothesis that the product reviews and overall ratings estimated by same person in a tiny time interval have a great relevance.
Abstract: Merchants, as well as customers, have noticed the importance of online product reviews and numeric ratings in electronic commerce websites. It is valuable if merchants can discover some potential customer value from the sheer volume of data. This paper contributes a semantic text analytics approach that can dig out the customers' most basic concerns about their online purchase choices. More specifically, based on the hypothesis that the product reviews and overall ratings estimated by same person in a tiny time interval have a great relevance, we dexterously utilize this relevance to realize the embedded customer value. In the proposed method, take the single lens reflex camera for example, an innovative aspect extraction method that comprehensively considers the product ontology and results of the topic modeling method latent Dirichlet allocation is applied. As a result, 8 specific aspects are identified from the experimental results. For each aspect, a self-contained review feature corpus is created as an extension of some seed terms. After aspect-based sentence segmentation and context-sensitive sentiments preprocessing, aspect-oriented sentiment analysis is applied. Multiple regression analysis is then used as a statistical measure to discover determinant aspects of overall ratings. The results reveal that cost performance, image quality and product integrity are the three most influential aspects. The practical implication of our research is that merchants can efficiently modify their products, to satisfy more customers and also boost sales performance.

31 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
86% related
Support vector machine
73.6K papers, 1.7M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Feature extraction
111.8K papers, 2.1M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023323
2022842
2021418
2020429
2019473
2018446