Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Machine Learning in Finance: A Topic Modeling Approach

[...]

Saqib Aziz¹, Michael Dowling¹, Michael Dowling², Helmi Hammami³, Anke Piepenbrink⁴ - Show less +1 more•Institutions (4)

ESC Rennes School of Business¹, Dublin City University², Qatar University³, ADA University⁴

01 Feb 2019-Social Science Research Network

TL;DR: This study provides a structured topography for finance researchers seeking to integrate machine learning research approaches in their exploration of finance phenomena and showcases the benefits to finance researchers of the method of probabilistic modeling of topics for deep comprehension of a body of literature.

...read moreread less

Abstract: We provide a first comprehensive structuring of the literature applying machine learning to finance. We use a probabilistic topic modeling approach to make sense of this diverse body of research spanning across the disciplines of finance, economics, computer sciences, and decision sciences. Through the topic modelling approach, a Latent Dirichlet Allocation technique, we are able to extract the 14 coherent research topics that are the focus of the 5,204 academic articles we analyze from the years 1990 to 2018. We first describe and structure these topics, and then further show how the topic focus has evolved over the last two decades. Our study thus provides a structured topography for finance researchers seeking to integrate machine learning research approaches in their exploration of finance phenomena. We also showcase the benefits to finance researchers of the method of probabilistic modeling of topics for deep comprehension of a body of literature, especially when that literature has diverse multi-disciplinary actors.

...read moreread less

39 citations

Proceedings Article•

Tag recommendation using probabilistic topic models

[...]

Ralf Krestel¹, Peter Fankhauser¹•Institutions (1)

Leibniz University of Hanover¹

07 Sep 2009

TL;DR: This paper introduces an approach based on Latent Dirichlet Allocation (LDA) for recommending tags of resources and evaluates recall and precision for the bibsonomy benchmark provided within the ECML PKDD Discovery Challenge 2009.

...read moreread less

Abstract: Tagging systems have become major infrastructures on the Web. They allow users to create tags that annotate and categorize content and share them with other users, very helpful in particular for searching multimedia content. However, as tagging is not constrained by a controlled vocabulary and annotation guidelines, tags tend to be noisy and sparse. Especially new resources annotated by only a few users have often rather idiosyncratic tags that do not reect a common perspective useful for search. In this paper we introduce an approach based on Latent Dirichlet Allocation (LDA) for recommending tags of resources. Resources annotated by many users and thus equipped with a fairly stable and complete tag set are used to elicit latent topics represented as a mixture of description tokens and tags. Based on this, new resources are mapped to latent topics based on their content in order to recommend the most likely tags from the latent topics. We evaluate recall and precision for the bibsonomy benchmark provided within the ECML PKDD Discovery Challenge 2009.

...read moreread less

39 citations

Journal Article•DOI•

RFBoost: An improved multi-label boosting algorithm and its application to text categorisation

[...]

Bassam Al-Salemi¹, Shahrul Azman Mohd Noah¹, Mohd Juzaiddin Ab Aziz¹•Institutions (1)

National University of Malaysia¹

01 Jul 2016-Knowledge Based Systems

TL;DR: An improved version of AdaBoost.MH is proposed, called RFBoost, which proposes two methods for ranking the features: One Boosting Round and Labeled Latent Dirichlet Allocation (LLDA), a supervised topic model based on Gibbs sampling.

...read moreread less

Abstract: The AdaBoost.MH boosting algorithm is considered to be one of the most accurate algorithms for multi-label classification. AdaBoost.MH works by iteratively building a committee of weak hypotheses of decision stumps. In each round of AdaBoost.MH learning, all features are examined, but only one feature is used to build a new weak hypothesis. This learning mechanism may entail a high degree of computational time complexity, particularly in the case of a large-scale dataset. This paper describes a way to manage the learning complexity and improve the classification performance of AdaBoost.MH. We propose an improved version of AdaBoost.MH, called RFBoost . The weak learning in RFBoost is based on filtering a small fixed number of ranked features in each boosting round rather than using all features, as AdaBoost.MH does. We propose two methods for ranking the features: One Boosting Round and Labeled Latent Dirichlet Allocation (LLDA), a supervised topic model based on Gibbs sampling. Additionally, we investigate the use of LLDA as a feature selection method for reducing the feature space based on the maximal conditional probabilities of words across labels. Our experimental results on eight well-known benchmarks for multi-label text categorisation show that RFBoost is significantly more efficient and effective than the baseline algorithms. Moreover, the LLDA-based feature ranking yields the best performance for RFBoost.

...read moreread less

39 citations

Journal Article•DOI•

Hierarchical Pitman-Yor-Dirichlet language model

[...]

Jen-Tzung Chien¹•Institutions (1)

National Chiao Tung University¹

01 Aug 2015-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A hierarchical Pitman-Yor-Dirichlet (HPYD) process is presented as the nonparametric priors to infer the predictive probabilities of the smoothed n-grams with the integrated topic information to reflect the properties of natural language in the estimated HPYD-LM.

...read moreread less

Abstract: Probabilistic models are often viewed as insufficiently expressive because of strong limitation and assumption on the probabilistic distribution and the fixed model complexity. Bayesian nonparametric learning pursues an expressive probabilistic representation based on the nonparametric prior and posterior distributions with less assumption-laden approach to inference. This paper presents a hierarchical Pitman-Yor-Dirichlet (HPYD) process as the nonparametric priors to infer the predictive probabilities of the smoothed n-grams with the integrated topic information. A metaphor of hierarchical Chinese restaurant process is proposed to infer the HPYD language model (HPYD-LM) via Gibbs sampling. This process is equivalent to implement the hierarchical Dirichlet process-latent Dirichlet allocation (HDP-LDA) with the twisted hierarchical Pitman-Yor LM (HPY-LM) as base measures. Accordingly, we produce the power-law distributions and extract the semantic topics to reflect the properties of natural language in the estimated HPYD-LM. The superiority of HPYD-LM to HPY-LM and other language models is demonstrated by the experiments on model perplexity and speech recognition.

...read moreread less

39 citations

Proceedings Article•DOI•

Sequential Latent Dirichlet Allocation: Discover Underlying Topic Structures within a Document

[...]

Lan Du¹, Wray Buntine, Huidong Jin²•Institutions (2)

Australian National University¹, Commonwealth Scientific and Industrial Research Organisation²

13 Dec 2010

TL;DR: By taking into account the sequential structure within a document, the SeqLDA model has a higher fidelity over LDA in terms of perplexity (a standard measure of dictionary-based compressibility) and yields a nicer sequential topic structure than LDA.

...read moreread less

Abstract: Understanding how topics within a document evolve over its structure is an interesting and important problem. In this paper, we address this problem by presenting a novel variant of Latent Dirichlet Allocation (LDA): Sequential LDA (SeqLDA). This variant directly considers the underlying sequential structure, {\it i.e.}, a document consists of multiple segments ({\it e.g.}, chapters, paragraphs), each of which is correlated to its previous and subsequent segments. In our model, a document and its segments are modelled as random mixtures of the same set of latent topics, each of which is a distribution over words, and the topic distribution of each segment depends on that of its previous segment, the one for first segment will depend on the document topic distribution. The progressive dependency is captured by using the nested two-parameter Poisson Dirichlet process (PDP). We develop an efficient collapsed Gibbs sampling algorithm to sample from the posterior of the PDP. Our experimental results on patent documents show that by taking into account the sequential structure within a document, our SeqLDA model has a higher fidelity over LDA in terms of perplexity (a standard measure of dictionary-based compressibility). The SeqLDA model also yields a nicer sequential topic structure than LDA, as we show in experiments on books such as Melville's "The Whale''.

...read moreread less

39 citations

Collapse

Network Information

Performance

Metrics

6,513

Papers

245,225

Citations

No. of papers in the topic in previous years
Year	Papers
2023	323
2022	842
2021	418
2020	429
2019	473
2018	446

Latent Dirichlet allocation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics