Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Book•

Visual Analysis of Behaviour: From Pixels to Semantics

[...]

Shaogang Gong, Tao Xiang

26 May 2011

TL;DR: This book presents a comprehensive treatment of visual analysis of behaviour from computational-modelling and algorithm-design perspectives.

...read moreread less

Abstract: This book presents a comprehensive treatment of visual analysis of behaviour from computational-modelling and algorithm-design perspectives. Topics: covers learning-group activity models, unsupervised behaviour profiling, hierarchical behaviour discovery, learning behavioural context, modelling rare behaviours, and man-in-the-loop active learning; examines multi-camera behaviour correlation, person re-identification, and connecting-the-dots for abnormal behaviour detection; discusses Bayesian information criterion, Bayesian networks, bag-of-words representation, canonical correlation analysis, dynamic Bayesian networks, Gaussian mixtures, and Gibbs sampling; investigates hidden conditional random fields, hidden Markov models, human silhouette shapes, latent Dirichlet allocation, local binary patterns, locality preserving projection, and Markov processes; explores probabilistic graphical models, probabilistic topic models, space-time interest points, spectral clustering, and support vector machines.

...read moreread less

31 citations

Proceedings Article•DOI•

Fake reviews detection based on LDA

[...]

Shaohua Jia¹, Xianguo Zhang¹, Xinyue Wang¹, Yang Liu¹•Institutions (1)

Inner Mongolia University¹

01 May 2018

TL;DR: This work attempts to find out how to distinguish between fake reviews and non-fake reviews by using linguistic features in terms of Yelp Filter Dataset, and to its surprise, the linguistic features performed well.

...read moreread less

Abstract: It is necessary for potential consume to make decision based on online reviews. However, its usefulness brings forth a curse - deceptive opinion spam. The deceptive opinion spam mislead potential customers and organizations reshaping their businesses and prevent opinion-mining techniques from reaching accurate conclusions. Thus, the detection of fake reviews has become more and more fervent. In this work, we attempt to find out how to distinguish between fake reviews and non-fake reviews by using linguistic features in terms of Yelp Filter Dataset. To our surprise, the linguistic features performed well. Further, we proposed a method to extract features based on Latent Dirichlet Allocation. The result of experiment proved that the method is effective.

...read moreread less

31 citations

Journal Article•DOI•

Discriminative-Dictionary-Learning-Based Multilevel Point-Cluster Features for ALS Point-Cloud Classification

[...]

Zhenxin Zhang¹, Liqiang Zhang¹, Xiaohua Tong², Bo Guo³, Liang Zhang¹, Xiaoyue Xing¹ - Show less +2 more•Institutions (3)

Beijing Normal University¹, Tongji University², Shenzhen University³

31 Aug 2016-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: The experimental results have shown that the extracted point-cluster features combined with the multipath classifiers can significantly enhance the classification accuracy, and they have demonstrated the superior performance of the method over other techniques in point-cloud classification.

...read moreread less

Abstract: Efficient presentation and recognition of on-ground objects from airborne laser scanning (ALS) point clouds are a challenging task. In this paper, we propose an approach that combines a discriminative-dictionary-learning-based sparse coding and latent Dirichlet allocation (LDA) to generate multilevel point-cluster features for ALS point-cloud classification. Our method takes advantage of the labels of training data and each dictionary item to enforce discriminability in sparse coding during the dictionary learning process and more accurately further represent point-cluster features. The multipath AdaBoost classifiers with the hierarchical point-cluster features are trained, and we apply them to the classification of unknown points by the heritance of the recognition results under different paths. Experiments are performed on different ALS point clouds; the experimental results have shown that the extracted point-cluster features combined with the multipath classifiers can significantly enhance the classification accuracy, and they have demonstrated the superior performance of our method over other techniques in point-cloud classification.

...read moreread less

31 citations

Journal Article•DOI•

From Feature Engineering and Topics Models to Enhanced Prediction Rates in Phishing Detection

[...]

Éder S. Gualberto¹, Rafael Timóteo de Sousa Júnior¹, Thiago Pereira de Brito Vieira¹, Joao Paulo C. L. da Costa¹, Cláudio Gottschalg Duque¹ - Show less +1 more•Institutions (1)

University of Brasília¹

21 Apr 2020-IEEE Access

TL;DR: This approach outperforms state-of-the-art phishing detection researches for an accredited data set, in applications based only on the body of the e-mails, without using other e-mail features such as its header, IP information or number of links in the text.

...read moreread less

Abstract: Phishing is a type of fraud attempt in which the attacker, usually by e-mail, pretends to be a trusted person or entity in order to obtain sensitive information from a target. Most recent phishing detection researches have focused on obtaining highly distinctive features from the metadata and text of these e-mails. The obtained attributes are then used to feed classification algorithms in order to determine whether they are phishing or legitimate messages. In this paper, it is proposed an approach based on machine learning to detect phishing e-mail attacks. The methods that compose this approach are performed through a feature engineering process based on natural language processing, lemmatization, topics modeling, improved learning techniques for resampling and cross-validation, and hyperparameters configuration. The first proposed method uses all the features obtained from the Document-Term Matrix (DTM) in the classification algorithms. The second one uses Latent Dirichlet Allocation (LDA) as a operation to deal with the problems of the “curse of dimensionality”, the sparsity, and the text context portion included in the obtained representation. The proposed approach reached marks with an F1-measure of 99.95% success rate using the XGBoost algorithm. It outperforms state-of-the-art phishing detection researches for an accredited data set, in applications based only on the body of the e-mails, without using other e-mail features such as its header, IP information or number of links in the text.

...read moreread less

31 citations

Journal Article•DOI•

The determinants of online customer ratings: a combined domain ontology and topic text analytics approach

[...]

Runyu Chen¹, Wei Xu¹•Institutions (1)

Renmin University of China¹

01 Mar 2017-Electronic Commerce Research

TL;DR: A semantic text analytics approach that can dig out the customers’ most basic concerns about their online purchase choices, based on the hypothesis that the product reviews and overall ratings estimated by same person in a tiny time interval have a great relevance.

...read moreread less

Abstract: Merchants, as well as customers, have noticed the importance of online product reviews and numeric ratings in electronic commerce websites. It is valuable if merchants can discover some potential customer value from the sheer volume of data. This paper contributes a semantic text analytics approach that can dig out the customers' most basic concerns about their online purchase choices. More specifically, based on the hypothesis that the product reviews and overall ratings estimated by same person in a tiny time interval have a great relevance, we dexterously utilize this relevance to realize the embedded customer value. In the proposed method, take the single lens reflex camera for example, an innovative aspect extraction method that comprehensively considers the product ontology and results of the topic modeling method latent Dirichlet allocation is applied. As a result, 8 specific aspects are identified from the experimental results. For each aspect, a self-contained review feature corpus is created as an extension of some seed terms. After aspect-based sentence segmentation and context-sensitive sentiments preprocessing, aspect-oriented sentiment analysis is applied. Multiple regression analysis is then used as a statistical measure to discover determinant aspects of overall ratings. The results reveal that cost performance, image quality and product integrity are the three most influential aspects. The practical implication of our research is that merchants can efficiently modify their products, to satisfy more customers and also boost sales performance.

...read moreread less

31 citations

Collapse

Network Information

Performance

Metrics

6,513

Papers

245,225

Citations

No. of papers in the topic in previous years
Year	Papers
2023	323
2022	842
2021	418
2020	429
2019	473
2018	446

Latent Dirichlet allocation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics