Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Style in the long tail: discovering unique interests with latent variable models in large scale social E-commerce

[...]

Diane Hu, Rob Hall, Josh Attenberg

24 Aug 2014

TL;DR: This paper uses Latent Dirichlet Allocation (LDA) to discover trending categories and styles on Etsy, which are then used to describe a user's "interest" profile, and explores hashing methods to perform fast nearest neighbor search on a map-reduce framework, in order to efficiently obtain recommendations.

...read moreread less

Abstract: Purchasing decisions in many product categories are heavily influenced by the shopper's aesthetic preferences. It's insufficient to simply match a shopper with popular items from the category in question; a successful shopping experience also identifies products that match those aesthetics. The challenge of capturing shoppers' styles becomes more difficult as the size and diversity of the marketplace increases. At Etsy, an online marketplace for handmade and vintage goods with over 30 million diverse listings, the problem of capturing taste is particularly important -- users come to the site specifically to find items that match their eclectic styles.In this paper, we describe our methods and experiments for deploying two new style-based recommender systems on the Etsy site. We use Latent Dirichlet Allocation (LDA) to discover trending categories and styles on Etsy, which are then used to describe a user's "interest" profile. We also explore hashing methods to perform fast nearest neighbor search on a map-reduce framework, in order to efficiently obtain recommendations. These techniques have been implemented successfully at very large scale, substantially improving many key business metrics.

...read moreread less

50 citations

Proceedings Article•

Nonparametric Bayesian word sense induction

[...]

Xuchen Yao¹, Benjamin Van Durme¹•Institutions (1)

Johns Hopkins University¹

23 Jun 2011

TL;DR: Experimental results confirm the HDP model's ability to make use of a restricted set of topically coherent induced senses, when then applied in a restricted domain, when trained on out-of-domain data.

...read moreread less

Abstract: We propose the use of a nonparametric Bayesian model, the Hierarchical Dirichlet Process (HDP), for the task of Word Sense Induction. Results are shown through comparison against Latent Dirichlet Allocation (LDA), a parametric Bayesian model employed by Brody and Lapata (2009) for this task. We find that the two models achieve similar levels of induction quality, while the HDP confers the advantage of automatically inducing a variable number of senses per word, as compared to manually fixing the number of senses a priori, as in LDA. This flexibility allows for the model to adapt to terms with greater or lesser polysemy, when evidenced by corpus distributional statistics. When trained on out-of-domain data, experimental results confirm the model's ability to make use of a restricted set of topically coherent induced senses, when then applied in a restricted domain.

...read moreread less

49 citations

Proceedings Article•

Topic Modeling of Research Fields: An Interdisciplinary Perspective

[...]

Michael J. Paul¹, Roxana Girju•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Dec 2009

TL;DR: This paper addresses the problem of scientific research analysis by using the topic model Latent Dirichlet Allocation and a novel classifier to classify research papers based on topic and language and shows various insightful statistics and correlations within and across three research fields: Linguistic, Computational Linguistics, and Education.

...read moreread less

Abstract: This paper addresses the problem of scientific research analysis. We use the topic model Latent Dirichlet Allocation [2] and a novel classifier to classify research papers based on topic and language. Moreover, we show various insightful statistics and correlations within and across three research fields: Linguistics, Computational Linguistics, and Education. In particular, we show how topics change over time within each field, what relations and influences exist between topics within and across fields, as well as what trends can be established for some of the world’s natural languages. Finally, we talk about trend prediction and topic suggestion as future extensions of this research.

...read moreread less

49 citations

Journal Article•DOI•

Topic Model for Graph Mining

[...]

Junyu Xuan¹, Jie Lu¹, Guangquan Zhang¹, Xiangfeng Luo²•Institutions (2)

University of Technology, Sydney¹, Shanghai University²

20 Jan 2015-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: An innovative graph topic model (GTM) is proposed to address this issue, which uses Bernoulli distributions to model the edges between nodes in a graph to make the edges in a graphs contribute to latent topic discovery and further improve the accuracy of the supervised and unsupervised learning of graphs.

...read moreread less

Abstract: Graph mining has been a popular research area because of its numerous application scenarios. Many unstructured and structured data can be represented as graphs, such as, documents, chemical molecular structures, and images. However, an issue in relation to current research on graphs is that they cannot adequately discover the topics hidden in graph-structured data which can be beneficial for both the unsupervised learning and supervised learning of the graphs. Although topic models have proved to be very successful in discovering latent topics, the standard topic models cannot be directly applied to graph-structured data due to the “bag-of-word” assumption. In this paper, an innovative graph topic model (GTM) is proposed to address this issue, which uses Bernoulli distributions to model the edges between nodes in a graph. It can, therefore, make the edges in a graph contribute to latent topic discovery and further improve the accuracy of the supervised and unsupervised learning of graphs. The experimental results on two different types of graph datasets show that the proposed GTM outperforms the latent Dirichlet allocation on classification by using the unveiled topics of these two models to represent graphs.

...read moreread less

49 citations

Journal Article•DOI•

A Systematic Evaluation of the Bag-of-Frames Representation for Music Information Retrieval

[...]

Li Su¹, Chin-Chia Michael Yeh¹, Jen-Yu Liu¹, Ju-Chiang Wang², Yi-Hsuan Yang¹ - Show less +1 more•Institutions (2)

Center for Information Technology¹, Academia Sinica²

11 Mar 2014-IEEE Transactions on Multimedia

TL;DR: A comprehensive evaluation that compares a large number of BoF variants on three different MIR tasks, by considering different ways of low-level feature representation, codebook construction, codeword assignment, segment-level and song- level feature pooling, tf-idf term weighting, power normalization, and dimension reduction leads to the following findings.

...read moreread less

Abstract: There has been an increasing attention on learning feature representations from the complex, high-dimensional audio data applied in various music information retrieval (MIR) problems Unsupervised feature learning techniques, such as sparse coding and deep belief networks have been utilized to represent music information as a term-document structure comprising of elementary audio codewords Despite the widespread use of such bag-of-frames (BoF) model, few attempts have been made to systematically compare different component settings Moreover, whether techniques developed in the text retrieval community are applicable to audio codewords is poorly understood To further our understanding of the BoF model, we present in this paper a comprehensive evaluation that compares a large number of BoF variants on three different MIR tasks, by considering different ways of low-level feature representation, codebook construction, codeword assignment, segment-level and song-level feature pooling, tf-idf term weighting, power normalization, and dimension reduction Our evaluations lead to the following findings: 1) modeling music information by two levels of abstraction improves the result for difficult tasks such as predominant instrument recognition, 2) tf-idf weighting and power normalization improve system performance in general, 3) topic modeling methods such as latent Dirichlet allocation does not work for audio codewords

...read moreread less

49 citations

Collapse

Network Information

Performance

Metrics

6,513

Papers

245,225

Citations

No. of papers in the topic in previous years
Year	Papers
2023	323
2022	842
2021	418
2020	429
2019	473
2018	446

Latent Dirichlet allocation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics