Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Analyzing knowledge flows of scientific literature through semantic links: a case study in the field of energy

[...]

Saeed-Ul Hassan¹, Peter Haddawy²•Institutions (2)

Information Technology University¹, Mahidol University²

01 Apr 2015-Scientometrics

TL;DR: A new technique to semantically analyze knowledge flows across countries by using publication and citation data is proposed, which indicates that Japanese researchers focus in the research areas such as efficient use of Photovoltaic, Energy Conversion and Superconductors (to produce low-cost renewable energy).

...read moreread less

Abstract: In this paper we propose a new technique to semantically analyze knowledge flows across countries by using publication and citation data. We start with the identification of research topics produced by a given source country. Then, we collect papers, published by the authors outside the source country, citing the identified research topics. At last, we group each set of citing papers separately to determine the scholarly impact of the actual identified research topics in the cited topics. The research topics are identified using our proposed topic model with distance matrix, an extension of classic Latent Dirichlet Allocation model. We also present a case study to illustrate the use of our proposed techniques in the subject area Energy during 2004---2009 using the Scopus database. We compare the Japanese and Chinese papers that cite the scientific literature produced by the researchers from the United States in order to show the difference in the use of same knowledge. The results indicate that Japanese researchers focus in the research areas such as efficient use of Photovoltaic, Energy Conversion and Superconductors (to produce low-cost renewable energy). In contrast with the Japanese researchers, Chinese researchers focus in the areas of Power Systems, Power Grids and Solar Cells production. Such analyses are useful for understanding the dynamics of the relevant knowledge flows across the nations.

...read moreread less

37 citations

Journal Article•DOI•

A Systematic Spatial and Temporal Sentiment Analysis on Geo-Tweets

[...]

Tao Hu¹, Bing She², Lian Duan, Han Yue³, Julaine Clunis⁴ - Show less +1 more•Institutions (4)

Harvard University¹, University of Michigan², Wuhan University³, Kent State University⁴

01 Jan 2020-IEEE Access

TL;DR: Local users’ sentiments extracted from Geo-tweets data from January to December 2016, analyzed in the spatial and temporal perspective are explored, finding patterns which demonstrate the associations between the nature of Twitter content and the characteristics of places and users.

...read moreread less

Abstract: Sentiment affects every aspect of people's lives and has strong impact on their mental health. This paper explores local users' sentiments extracted from Geo-tweets data from January to December 2016, analyzed in the spatial and temporal perspective. Because of large amount of noisy data and complicated procedure of extracting local user, a workflow is created, facilitating more researchers to reproduce, replicate or extend the procedures using similar Geo-tweet dataset. The workflow is sharing at Harvard Dataverse (https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/6N9VUF). Using the processed data, each tweet's sentiment is classified according to the content. Then, the overall temporal variations of total number of positive, neural, and negative sentiments are analyzed on a monthly, daily and hourly level. From a spatial perspective, the Local Indicators of Spatial Association (LISA) statistical method is employed to discover the spatial clusters. In order to explore the content of positive sentiments, this paper applies the Latent Dirichlet Allocation (LDA) model to classify the Geo-tweets with positive sentiments into different topics. Combining the geospatial information with the topics, some patterns are found which demonstrate the associations between the nature of Twitter content and the characteristics of places and users. For example, weekend events and friend and family gatherings are the time that users prefer to post positive tweets. In the western part of US, users tend to post more photos to share the great moment on Twitter than other parts of the US.

...read moreread less

37 citations

Journal Article•DOI•

Statistical modeling of biomedical corpora: mining the Caenorhabditis Genetic Center Bibliography for genes related to life span

[...]

David M. Blei¹, Kasian Franks², Michael I. Jordan³, IS Mian²•Institutions (3)

Princeton University¹, Lawrence Berkeley National Laboratory², University of California, Berkeley³

08 May 2006-BMC Bioinformatics

TL;DR: To illustrate the practical utility of LDA models of biomedical corpora, a trained CGC LDA model was used for a retrospective study of nematode genes known to be associated with life span modification, and a novel, pairwise document similarity measure based on the posterior distribution on the topic simplex was formulated.

...read moreread less

Abstract: The statistical modeling of biomedical corpora could yield integrated, coarse-to-fine views of biological phenomena that complement discoveries made from analysis of molecular sequence and profiling data. Here, the potential of such modeling is demonstrated by examining the 5,225 free-text items in the Caenorhabditis Genetic Center (CGC) Bibliography using techniques from statistical information retrieval. Items in the CGC biomedical text corpus were modeled using the Latent Dirichlet Allocation (LDA) model. LDA is a hierarchical Bayesian model which represents a document as a random mixture over latent topics; each topic is characterized by a distribution over words. An LDA model estimated from CGC items had better predictive performance than two standard models (unigram and mixture of unigrams) trained using the same data. To illustrate the practical utility of LDA models of biomedical corpora, a trained CGC LDA model was used for a retrospective study of nematode genes known to be associated with life span modification. Corpus-, document-, and word-level LDA parameters were combined with terms from the Gene Ontology to enhance the explanatory value of the CGC LDA model, and to suggest additional candidates for age-related genes. A novel, pairwise document similarity measure based on the posterior distribution on the topic simplex was formulated and used to search the CGC database for "homologs" of a "query" document discussing the life span-modifying clk-2 gene. Inspection of these document homologs enabled and facilitated the production of hypotheses about the function and role of clk-2. Like other graphical models for genetic, genomic and other types of biological data, LDA provides a method for extracting unanticipated insights and generating predictions amenable to subsequent experimental validation.

...read moreread less

37 citations

Proceedings Article•DOI•

[...]

Yik-Cheung Tam¹, Tanja Schultz¹•Institutions (1)

Carnegie Mellon University¹

15 Apr 2007

TL;DR: A latent Dirichlet-tree allocation (LDTA) model - a correlated latent semantic model - for unsupervised language model adaptation is proposed and empirical results show that the LDTA model has a faster training convergence than the LDA model with the same initial flat model.

...read moreread less

Abstract: We propose a latent Dirichlet-tree allocation (LDTA) model - a correlated latent semantic model - for unsupervised language model adaptation. The LDTA model extends the latent Dirichlet allocation (LDA) model by replacing a Dirichlet prior with a Dirichlet-tree prior over the topic proportions. Latent topics under the same subtree are expected to be more correlated than topics under different subtrees. The LDTA model falls back to the LDA model using a depth-one Dirichlet-tree, and the model fits to the variational Bayes inference framework employed in the LDA model. Empirical results show that the LDTA model has a faster training convergence than the LDA model with the same initial flat model. Experimental results show that LDTA-adapted LM performed better than LDA-adapted LM on the Mandarin RT04-eval set when the models were trained using a small text corpus, while both models had the same recognition performance when the models were trained using a big text corpus. We observed 0.4% absolute CER reduction after LM adaptation using LSA marginals.

...read moreread less

37 citations

Journal Article•DOI•

Ranking of high-value social audiences on Twitter

[...]

Siaw Ling Lo¹, Raymond Chiong¹, David Cornforth¹•Institutions (1)

University of Newcastle¹

01 May 2016

TL;DR: A ranking mechanism capable of identifying the top-k social audience members on Twitter based on an index that has the potential to be adopted in real-world applications for differentiating prospective customers from the general audience and enabling market segmentation for better business decision making is presented.

...read moreread less

Abstract: Even though social media offers plenty of business opportunities, for a company to identify the right audience from the massive amount of social media data is highly challenging given finite resources and marketing budgets. In this paper, we present a ranking mechanism that is capable of identifying the top-k social audience members on Twitter based on an index. Data from three different Twitter business account owners were used in our experiments to validate this ranking mechanism. The results show that the index developed using a combination of semi-supervised and supervised learning methods is indeed generic enough to retrieve relevant audience members from the three different data sets. This approach of combining Fuzzy Match, Twitter Latent Dirichlet Allocation and Support Vector Machine Ensemble is able to leverage on the content of account owners to construct seed words and training data sets with minimal annotation efforts. We conclude that this ranking mechanism has the potential to be adopted in real-world applications for differentiating prospective customers from the general audience and enabling market segmentation for better business decision making. An approach to rank the high-value social audience (HVSA) on Twitter is proposed.An HVSA index is developed using various methods with minimal annotation effort.Top-k HVSA members are identified from three data sets of different nature.A pooling strategy and Average [email protected] are recommended for the HVSA ranking.Audience segmentation on the ranked HVSA enables better decision making.

...read moreread less

37 citations

Collapse

Network Information

Performance

Metrics

6,513

Papers

245,225

Citations

No. of papers in the topic in previous years
Year	Papers
2023	323
2022	842
2021	418
2020	429
2019	473
2018	446

Latent Dirichlet allocation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics