Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Efficient topic-based unsupervised name disambiguation

[...]

Yang Song¹, Jian Huang¹, Isaac G. Councill¹, Jia Li¹, C. Lee Giles¹ - Show less +1 more•Institutions (1)

Pennsylvania State University¹

18 Jun 2007

TL;DR: This paper presents an efficient and effective two-stage approach to disambiguate person names within web pages and scientific documents and empirically addressed the issue of scalability bydisambiguating authors in over 750,000 papers from the entire CiteSeer dataset.

...read moreread less

Abstract: Name ambiguity is a special case of identity uncertainty where one person can be referenced by multiple name variations in different situations or even share the same name with other people. In this paper, we focus on the problem of disambiguating person names within web pages and scientific documents. We present an efficient and effective two-stage approach to disambiguate names. In the first stage, two novel topic-based models are proposed by extending two hierarchical Bayesian text models, namely Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA). Our models explicitly introduce a new variable for persons and learn the distribution of topics with regard to persons and words. After learning an initial model, the topic distributions are treated as feature sets and names are disambiguated by leveraging a hierarchical agglomerative clustering method. Experiments on web data and scientific documents from CiteSeer indicate that our approach consistently outperforms other unsupervised learning methods such as spectral clustering and DBSCAN clustering and could be extended to other research fields. We empirically addressed the issue of scalability by disambiguating authors in over 750,000 papers from the entire CiteSeer dataset.

...read moreread less

172 citations

Journal Article•

Hyperfeatures : Multilevel Local Coding for Visual Recognition

[...]

Ankur Agarwal¹, Bill Triggs¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

01 Jan 2006-Lecture Notes in Computer Science

TL;DR: In this article, a multilevel visual representation, called hyperfeatures, is proposed to exploit spatial co-occurrence statistics at scales larger than their local input patches, which is designed to remedy the shortcomings of local appearance descriptors.

...read moreread less

Abstract: Histograms of local appearance descriptors are a popular representation for visual recognition. They are highly discriminant and have good resistance to local occlusions and to geometric and photometric variations, but they are not able to exploit spatial co-occurrence statistics at scales larger than their local input patches. We present a new multilevel visual representation, 'hyperfeatures', that is designed to remedy this. The starting point is the familiar notion that to detect object parts, in practice it often suffices to detect co-occurrences of more local object fragments - a process that can be formalized as comparison (e.g. vector quantization) of image patches against a codebook of known fragments, followed by local aggregation of the resulting codebook membership vectors to detect co-occurrences. This process converts local collections of image descriptor vectors into somewhat less local histogram vectors - higher-level but spatially coarser descriptors. We observe that as the output is again a local descriptor vector, the process can be iterated, and that doing so captures and codes ever larger assemblies of object parts and increasingly abstract or 'semantic' image properties. We formulate the hyperfeatures model and study its performance under several different image coding methods including clustering based Vector Quantization, Gaussian Mixtures, and combinations of these with Latent Dirichlet Allocation. We find that the resulting high-level features provide improved performance in several object image and texture image classification tasks.

...read moreread less

171 citations

Journal Article•DOI•

What Security Questions Do Developers Ask? A Large-Scale Study of Stack Overflow Posts

[...]

Xin Li Yang¹, David Lo², Xin Xia¹, Zhiyuan Wan¹, Jian Ling Sun¹ - Show less +1 more•Institutions (2)

Zhejiang University¹, Singapore Management University²

05 Sep 2016-Journal of Computer Science and Technology

TL;DR: A large-scale study on security-related questions on Stack Overflow, which summarizes all the topics into five main categories, and investigates the popularity and difficulty of different topics as well.

...read moreread less

Abstract: Security has always been a popular and critical topic. With the rapid development of information technology, it is always attracting people’s attention. However, since security has a long history, it covers a wide range of topics which change a lot, from classic cryptography to recently popular mobile security. There is a need to investigate security-related topics and trends, which can be a guide for security researchers, security educators and security practitioners. To address the above-mentioned need, in this paper, we conduct a large-scale study on security-related questions on Stack Overflow. Stack Overflow is a popular on-line question and answer site for software developers to communicate, collaborate, and share information with one another. There are many different topics among the numerous questions posted on Stack Overflow and security-related questions occupy a large proportion and have an important and significant position. We first use two heuristics to extract from the dataset the questions that are related to security based on the tags of the posts. And then we use an advanced topic model, Latent Dirichlet Allocation (LDA) tuned using Genetic Algorithm (GA), to cluster different security-related questions based on their texts. After obtaining the different topics of security-related questions, we use their metadata to make various analyses. We summarize all the topics into five main categories, and investigate the popularity and difficulty of different topics as well. Based on the results of our study, we conclude several implications for researchers, educators and practitioners.

...read moreread less

170 citations

The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email

[...]

Andrew McCallum¹, Andres Corrada-Emmanuel¹, Xuerui Wang¹•Institutions (1)

University of Massachusetts Amherst¹

01 Jan 2005

TL;DR: The authors proposed the Author-Recipient-Topic (ART) model for social network analysis, which learns topic distributions based on the directionsensitive messages sent between entities, adding the key attribute that distribution over topics is conditioned distinctly on both the sender and recipient, steering the discovery of topics according to the relationships between people.

...read moreread less

Abstract: Previous work in social network analysis (SNA) has modeled the existence of links from one entity to another, but not the language content or topics on those links. We present the Author-Recipient-Topic (ART) model for social network analysis, which learns topic distributions based on the the directionsensitive messages sent between entities. The model builds on Latent Dirichlet Allocation and the Author-Topic (AT) model, adding the key attribute that distribution over topics is conditioned distinctly on both the sender and recipient—steering the discovery of topics according to the relationships between people. We give results on both the Enron email corpus and a researcher’s email archive, providing evidence not only that clearly relevant topics are discovered, but that the ART model better predicts people’s roles.

...read moreread less

169 citations

Journal Article•DOI•

Discovering themes and trends in transportation research using topic modeling

[...]

Lijun Sun¹, Yafeng Yin²•Institutions (2)

Massachusetts Institute of Technology¹, University of Michigan²

01 Apr 2017-Transportation Research Part C-emerging Technologies

TL;DR: An empirical analysis of 17,163 articles published in 22 leading transportation journals from 1990 to 2015 using a latent Dirichlet allocation (LDA) model to infer 50 key topics is presented, suggesting that research communities in different regions tend to focus on different sub-fields.

...read moreread less

Abstract: Transportation research is a key area in both science and engineering. In this paper, we present an empirical analysis of 17,163 articles published in 22 leading transportation journals from 1990 to 2015. We apply a latent Dirichlet allocation (LDA) model on article abstracts to infer 50 key topics. We show that those characterized topics are both representative and meaningful, mostly corresponding to established sub-fields in transportation research. These identified fields reveal a research landscape for transportation. Based on the results of LDA, we quantify the similarity of journals and countries/regions in terms of their aggregated topic distributions. By measuring the variation of topic distributions over time, we find some general research trends, such as topics on sustainability, travel behavior and non-motorized mobility are becoming increasingly popular over time. We also carry out this temporal analysis for each journal, observing a high degree of consistency for most journals. However, some interesting anomaly, such as special issues on particular topics, are detected from temporal variation as well. By quantifying the temporal trends at the country/region level, we find that countries/regions display clearly distinguishable patterns, suggesting that research communities in different regions tend to focus on different sub-fields. Our results could benefit different parties in the academic community—including researchers, journal editors and funding agencies—in terms of identifying promising research topics/projects, seeking for candidate journals for a submission, and realigning focus for journal development.

...read moreread less

168 citations

Collapse

Network Information

Performance

Metrics

6,525

Papers

245,260

Citations

No. of papers in the topic in previous years
Year	Papers
2023	323
2022	850
2021	420
2020	429
2019	473
2018	447

Latent Dirichlet allocation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics