Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

The FLDA model for aspect-based opinion mining: addressing the cold start problem

[...]

Samaneh Moghaddam¹, Martin Ester¹•Institutions (1)

Simon Fraser University¹

13 May 2013

TL;DR: This paper proposes a probabilistic graphical model based on LDA, called Factorized LDA (FLDA), to address the cold start problem and demonstrates the improved effectiveness of the FLDA model in terms of likelihood of the held-out test set.

...read moreread less

Abstract: Aspect-based opinion mining from online reviews has attracted a lot of attention recently The main goal of all of the proposed methods is extracting aspects and/or estimating aspect ratings Recent works, which are often based on Latent Dirichlet Allocation (LDA), consider both tasks simultaneously These models are normally trained at the item level, ie, a model is learned for each item separately Learning a model per item is fine when the item has been reviewed extensively and has enough training data However, in real-life data sets such as those from Epinionscom and Amazoncom more than 90% of items have less than 10 reviews, so-called cold start items State-of-the-art LDA models for aspect-based opinion mining are trained at the item level and therefore perform poorly for cold start items due to the lack of sufficient training data In this paper, we propose a probabilistic graphical model based on LDA, called Factorized LDA (FLDA), to address the cold start problem The underlying assumption of FLDA is that aspects and ratings of a review are influenced not only by the item but also by the reviewer It further assumes that both items and reviewers can be modeled by a set of latent factors which represent their aspect and rating distributions Different from state-of-the-art LDA models, FLDA is trained at the category level and learns the latent factors using the reviews of all the items of a category, in particular the non cold start items, and uses them as prior for cold start items Our experiments on three real-life data sets demonstrate the improved effectiveness of the FLDA model in terms of likelihood of the held-out test set We also evaluate the accuracy of FLDA based on two application-oriented measures

...read moreread less

81 citations

Journal Article•DOI•

Topic-based social network analysis for virtual communities of interests in the dark web

[...]

Gaston L'Huillier¹, Hector Alvarez¹, Sebastián A. Ríos¹, Felipe Aguilera¹•Institutions (1)

University of Chile¹

31 Mar 2011-Sigkdd Explorations

TL;DR: This work addresses the topic-based community key-members extraction problem, for which the method combines both text mining and social network analysis techniques.

...read moreread less

Abstract: The study of extremist groups and their interaction is a crucial task in order to maintain homeland security and peace. Tools such as social networks analysis and text mining have contributed to their understanding in order to develop counter-terrorism applications. This work addresses the topic-based community key-members extraction problem, for which our method combines both text mining and social network analysis techniques. This is achieved by first applying latent Dirichlet allocation to build two topic-based social networks in online forums: one social network oriented towards the thread creator point-of-view, and the other is oriented towards the repliers of the overall forum. Then, by using different network analysis measures, topic-based key members are evaluated using as benchmark a social network built a plain representation of the network of posts. Experiments were successfully performed using an English language based forum available in the Dark Web portal.

...read moreread less

81 citations

Proceedings Article•DOI•

Effective Multi-Query Expansions: Robust Landmark Retrieval

[...]

Yang Wang¹, Xuemin Lin¹, Lin Wu², Wenjie Zhang¹•Institutions (2)

University of New South Wales¹, University of Adelaide²

13 Oct 2015

TL;DR: A novel framework, namely multi- query expansions, to retrieve semantically robust landmarks by two steps is proposed, and a novel technique to generate the robust yet compact pattern set from the multi-query photos is proposed.

...read moreread less

Abstract: Given a query photo issued by a user (q-user), the landmark retrieval is to return a set of photos with their landmarks similar to those of the query, while the existing studies on the landmark retrieval focus on exploiting geometries of landmarks for similarity matches between candidate photos and a query photo. We observe that the same landmarks provided by different users may convey different geometry information depending on the viewpoints and/or angles, and may subsequently yield very different results. In fact, dealing with the landmarks with shapes caused by the photography of q-users is often nontrivial and has never been studied. Motivated by this, in this paper we propose a novel framework, namely multi-query expansions, to retrieve semantically robust landmarks by two steps. Firstly, we identify the top-k photos regarding the latent topics of a query landmark to construct multi-query set so as to remedy its possible shape. For this purpose, we significantly extend the techniques of Latent Dirichlet Allocation. Secondly, we propose a novel technique to generate the robust yet compact pattern set from the multi-query photos. To ensure redundancy-free and enhance the efficiency, we adopt the existing minimum-description-length-principle based pattern mining techniques to remove similar query photos from the (k+1) selected query photos. Then, a landmark retrieval rule is developed to calculate the ranking scores between mined pattern set and each photo in the database, which are ranked to serve as the final ranking list of landmark retrieval. Extensive experiments are conducted on real-world landmark datasets, validating the significantly higher accuracy of our approach.

...read moreread less

81 citations

Proceedings Article•DOI•

An Application of Latent Dirichlet Allocation to Analyzing Software Evolution

[...]

Erik Linstead¹, Cristina V. Lopes¹, Pierre Baldi¹•Institutions (1)

University of California, Irvine¹

11 Dec 2008

TL;DR: The results demonstrate the effectiveness of probabilistic topic models in automatically summarizing the temporal dynamics of software concerns, with direct application to project management and program understanding, for two large, open source Java projects, Eclipse and Argo UML.

...read moreread less

Abstract: We develop and apply unsupervised statistical topic models, in particular latent Dirichlet allocation, to identify functional components of source code and study their evolution over multiple project versions. We present results for two large, open source Java projects, Eclipse and Argo UML, which are well-known and well-studied within the software mining community. Our results demonstrate the effectiveness of probabilistic topic models in automatically summarizing the temporal dynamics of software concerns, with direct application to project management and program understanding. In addition to detecting the emergence of topics on the release timeline which represent integration points for key source code functionality, our techniques can also be used to pinpoint refactoring events in the underlying software design, as well as to identify general programming concepts whose prevalence is dependent only of the size of the code base to be analyzed. Complete results are available from our supplementary materials website at http://sourcerer.ics.uci.edu/icmla2008/software_evolution.html.

...read moreread less

80 citations

Journal Article•DOI•

Configuring latent Dirichlet allocation based feature location

[...]

Lauren R. Biggers¹, Cecylia Bocovich², Riley Capshaw³, Brian P. Eddy¹, Letha H. Etzkorn⁴, Nicholas A. Kraft¹ - Show less +2 more•Institutions (4)

University of Alabama¹, University of Waterloo², Hendrix College³, University of Alabama in Huntsville⁴

01 Jun 2014-Empirical Software Engineering

TL;DR: The key findings are that exclusion of comments and literals from the corpus lowers accuracy and that heuristics for selecting LDA parameter values in the natural language context are suboptimal in the source code context.

...read moreread less

Abstract: Feature location is a program comprehension activity, the goal of which is to identify source code entities that implement a functionality. Recent feature location techniques apply text retrieval models such as latent Dirichlet allocation (LDA) to corpora built from text embedded in source code. These techniques are highly configurable, and the literature offers little insight into how different configurations affect their performance. In this paper we present a study of an LDA based feature location technique (FLT) in which we measure the performance effects of using different configurations to index corpora and to retrieve 618 features from 6 open source Java systems. In particular, we measure the effects of the query, the text extractor configuration, and the LDA parameter values on the accuracy of the LDA based FLT. Our key findings are that exclusion of comments and literals from the corpus lowers accuracy and that heuristics for selecting LDA parameter values in the natural language context are suboptimal in the source code context. Based on the results of our case study, we offer specific recommendations for configuring the LDA based FLT.

...read moreread less

80 citations

Collapse

Network Information

Performance

Metrics

6,513

Papers

245,225

Citations

No. of papers in the topic in previous years
Year	Papers
2023	323
2022	842
2021	418
2020	429
2019	473
2018	446

Latent Dirichlet allocation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics