Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Breast Histopathological Image Retrieval Based on Latent Dirichlet Allocation

[...]

Yibing Ma¹, Zhiguo Jiang¹, Haopeng Zhang¹, Fengying Xie¹, Yushan Zheng¹, Huaqiang Shi, Yu Zhao - Show less +3 more•Institutions (1)

Beihang University¹

01 Jul 2017-IEEE Journal of Biomedical and Health Informatics

TL;DR: This paper proposes an unsupervised, accurate, and fast retrieval method for a breast histopathological image, and employs the Gabor feature to describe the texture information, and is developing an online digital slide browsing and retrieval platform which can be applied in computer-aided diagnosis, pathology education, and WSI archiving and management.

...read moreread less

Abstract: In the field of pathology, whole slide image (WSI) has become the major carrier of visual and diagnostic information. Content-based image retrieval among WSIs can aid the diagnosis of an unknown pathological image by finding its similar regions in WSIs with diagnostic information. However, the huge size and complex content of WSI pose several challenges for retrieval. In this paper, we propose an unsupervised, accurate, and fast retrieval method for a breast histopathological image. Specifically, the method presents a local statistical feature of nuclei for morphology and distribution of nuclei, and employs the Gabor feature to describe the texture information. The latent Dirichlet allocation model is utilized for high-level semantic mining. Locality-sensitive hashing is used to speed up the search. Experiments on a WSI database with more than 8000 images from 15 types of breast histopathology demonstrate that our method achieves about 0.9 retrieval precision as well as promising efficiency. Based on the proposed framework, we are developing a search engine for an online digital slide browsing and retrieval platform, which can be applied in computer-aided diagnosis, pathology education, and WSI archiving and management.

...read moreread less

48 citations

Book Chapter•DOI•

Fully sparse topic models

[...]

Khoat Than¹, Tu Bao Ho¹•Institutions (1)

Japan Advanced Institute of Science and Technology¹

24 Sep 2012

TL;DR: This paper shows that FSTM can perform substantially better than various existing topic models by different performance measures, and provides a principled way to directly trade off sparsity of solutions against inference quality and running time.

...read moreread less

Abstract: In this paper, we propose Fully Sparse Topic Model (FSTM) for modeling large collections of documents. Three key properties of the model are: (1) the inference algorithm converges in linear time, (2) learning of topics is simply a multiplication of two sparse matrices, (3) it provides a principled way to directly trade off sparsity of solutions against inference quality and running time. These properties enable us to speedily learn sparse topics and to infer sparse latent representations of documents, and help significantly save memory for storage. We show that inference in FSTM is actually MAP inference with an implicit prior. Extensive experiments show that FSTM can perform substantially better than various existing topic models by different performance measures. Finally, our parallel implementation can handily learn thousands of topics from large corpora with millions of terms.

...read moreread less

48 citations

Proceedings Article•

Hashtag-based sub-event discovery using mutually generative LDA in Twitter

[...]

Chen Xing¹, Yuan Wang¹, Jie Liu¹, Yalou Huang¹, Wei-Ying Ma² - Show less +1 more•Institutions (2)

Nankai University¹, Microsoft²

12 Feb 2016

TL;DR: Experimental results show that MGe-LDA can significantly outperform state-of-the-art methods for sub-event discovery and highlights the role of hashtags as a semantic representation of the corresponding tweets.

...read moreread less

Abstract: Sub-event discovery is an effective method for social event analysis in Twitter. It can discover sub-events from large amount of noisy event-related information in Twitter and semantically represent them. The task is challenging because tweets are short, informal and noisy. To solve this problem, we consider leveraging event-related hashtags that contain many locations, dates and concise sub-event related descriptions to enhance sub-event discovery. To this end, we propose a hashtag-based mutually generative Latent Dirichlet Allocation model(MGe-LDA). In MGe-LDA, hashtags and topics of a tweet are mutually generated by each other. The mutually generative process models the relationship between hashtags and topics of tweets, and highlights the role of hashtags as a semantic representation of the corresponding tweets. Experimental results show that MGe-LDA can significantly outperform state-of-the-art methods for sub-event discovery.

...read moreread less

48 citations

Posted Content•DOI•

Closed- and open-vocabulary approaches to text analysis: A review, quantitative comparison, and recommendations.

[...]

Johannes C. Eichstaedt¹, Margaret L. Kern², David B. Yaden³, H. A. Schwartz⁴, Salvatore Giorgi⁵, Gregory Park⁵, Courtney A. Hagan⁵, Victoria A. Tobolsky⁵, Laura K. Smith⁵, Anneke Buffone⁵, Jonathan Iwry⁵, Martin E. P. Seligman⁵, Lyle H. Ungar⁵ - Show less +9 more•Institutions (5)

Stanford University¹, University of Melbourne², Johns Hopkins University³, Stony Brook University⁴, University of Pennsylvania⁵

01 Aug 2021-Psychological Methods

TL;DR: This narrative review and quantitative synthesis compares five predominant closed- and open-vocabulary methods, and compares the linguistic features associated with gender, age, and personality across the five methods using an existing dataset of Facebook status updates and self-reported survey data from 65,896 users.

...read moreread less

Abstract: Technology now makes it possible to understand efficiently and at large scale how people use language to reveal their everyday thoughts, behaviors, and emotions. Written text has been analyzed through both theory-based, closed-vocabulary methods from the social sciences as well as data-driven, open-vocabulary methods from computer science, but these approaches have not been comprehensively compared. To provide guidance on best practices for automatically analyzing written text, this narrative review and quantitative synthesis compares five predominant closed- and open-vocabulary methods: Linguistic Inquiry and Word Count (LIWC), the General Inquirer, DICTION, Latent Dirichlet Allocation, and Differential Language Analysis. We compare the linguistic features associated with gender, age, and personality across the five methods using an existing dataset of Facebook status updates and self-reported survey data from 65,896 users. Results are fairly consistent across methods. The closed-vocabulary approaches efficiently summarize concepts and are helpful for understanding how people think, with LIWC2015 yielding the strongest, most parsimonious results. Open-vocabulary approaches reveal more specific and concrete patterns across a broad range of content domains, better address ambiguous word senses, and are less prone to misinterpretation, suggesting that they are well-suited for capturing the nuances of everyday psychological processes. We detail several errors that can occur in closed-vocabulary analyses, the impact of sample size, number of words per user and number of topics included in open-vocabulary analyses, and implications of different analytical decisions. We conclude with recommendations for researchers, advocating for a complementary approach that combines closed- and open-vocabulary methods. (PsycInfo Database Record (c) 2021 APA, all rights reserved).

...read moreread less

48 citations

Proceedings Article•DOI•

Online learning of concepts and words using multimodal LDA and hierarchical Pitman-Yor Language Model

[...]

Takaya Araki¹, Tomoaki Nakamura¹, Takayuki Nagai¹, Shogo Nagasaka², Tadahiro Taniguchi², Naoto Iwahashi - Show less +2 more•Institutions (2)

University of Electro-Communications¹, Ritsumeikan University²

24 Dec 2012

TL;DR: A particle filter is introduced, which significantly improve the performance of the online MLDA, and an unsupervised word segmentation method based on hierarchical Pitman-Yor Language Model (HPYLM) is introduced.

...read moreread less

Abstract: In this paper, we propose an online algorithm for multimodal categorization based on the autonomously acquired multimodal information and partial words given by human users. For multimodal concept formation, multimodal latent Dirichlet allocation (MLDA) using Gibbs sampling is extended to an online version. We introduce a particle filter, which significantly improve the performance of the online MLDA, to keep tracking good models among various models with different parameters. We also introduce an unsupervised word segmentation method based on hierarchical Pitman-Yor Language Model (HPYLM). Since the HPYLM requires no predefined lexicon, we can make the robot system that learns concepts and words in completely unsupervised manner. The proposed algorithms are implemented on a real robot and tested using real everyday objects to show the validity of the proposed system.

...read moreread less

48 citations

Collapse

Network Information

Performance

Metrics

6,513

Papers

245,225

Citations

No. of papers in the topic in previous years
Year	Papers
2023	323
2022	842
2021	418
2020	429
2019	473
2018	446

Latent Dirichlet allocation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics