scispace - formally typeset
Search or ask a question
Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.


Papers
More filters
Posted Content
TL;DR: The authors proposed a variant of Latent Dirichlet Allocation in which the topic proportions for a document are replaced by synset proportions and further utilize the information in the WordNet by assigning a non-uniform prior to synset distribution over words and a logistic-normal prior for document distribution over synsets.
Abstract: Word Sense Disambiguation is an open problem in Natural Language Processing which is particularly challenging and useful in the unsupervised setting where all the words in any given text need to be disambiguated without using any labeled data. Typically WSD systems use the sentence or a small window of words around the target word as the context for disambiguation because their computational complexity scales exponentially with the size of the context. In this paper, we leverage the formalism of topic model to design a WSD system that scales linearly with the number of words in the context. As a result, our system is able to utilize the whole document as the context for a word to be disambiguated. The proposed method is a variant of Latent Dirichlet Allocation in which the topic proportions for a document are replaced by synset proportions. We further utilize the information in the WordNet by assigning a non-uniform prior to synset distribution over words and a logistic-normal prior for document distribution over synsets. We evaluate the proposed method on Senseval-2, Senseval-3, SemEval-2007, SemEval-2013 and SemEval-2015 English All-Word WSD datasets and show that it outperforms the state-of-the-art unsupervised knowledge-based WSD system by a significant margin.

45 citations

Journal ArticleDOI
TL;DR: A quantified method for patent development map generation that contributes to quantifying PDM generation and will become a useful monitoring tool for effective understanding of the technologies including massive patents.

45 citations

Proceedings ArticleDOI
02 Nov 2007
TL;DR: A method for the mapping of ontologies that discovers and exploits sets of latent features for approximating the intended meaning of ontology elements by applying the reverse generative process of the Latent Dirichlet Allocation model is proposed.
Abstract: This paper proposes a method for the mapping of ontologies that, in a greater extent than other approaches, discovers and exploits sets of latent features for approximating the intended meaning of ontology elements. This is done by applying the reverse generative process of the Latent Dirichlet Allocation model. Similarity between element pairs is computed by means of the Kullback-Leibler divergence measure. Experimental results show the potential of the method.

44 citations

Journal ArticleDOI
Shinjee Pyo1, Eunhui Kim1, Munchurl Kim1
TL;DR: A unified topic model based on grouping of similar TV users and recommending TV programs as a social TV service and superiority of the proposed model in terms of both topic modeling performance and recommendation performance compared to two related topic models such as polylingual topic model and bilingual topic model is shown.
Abstract: Social TV is a social media service via TV and social networks through which TV users exchange their experiences about TV programs that they are viewing. For social TV service, two technical aspects are envisioned: grouping of similar TV users to create social TV communities and recommending TV programs based on group and personal interests for personalizing TV. In this paper, we propose a unified topic model based on grouping of similar TV users and recommending TV programs as a social TV service. The proposed unified topic model employs two latent Dirichlet allocation (LDA) models. One is a topic model of TV users, and the other is a topic model of the description words for viewed TV programs. The two LDA models are then integrated via a topic proportion parameter for TV programs, which enforces the grouping of similar TV users and associated description words for watched TV programs at the same time in a unified topic modeling framework. The unified model identifies the semantic relation between TV user groups and TV program description word groups so that more meaningful TV program recommendations can be made. The unified topic model also overcomes an item ramp-up problem such that new TV programs can be reliably recommended to TV users. Furthermore, from the topic model of TV users, TV users with similar tastes can be grouped as topics, which can then be recommended as social TV communities. To verify our proposed method of unified topic-modeling-based TV user grouping and TV program recommendation for social TV services, in our experiments, we used real TV viewing history data and electronic program guide data from a seven-month period collected by a TV poll agency. The experimental results show that the proposed unified topic model yields an average 81.4% precision for 50 topics in TV program recommendation and its performance is an average of 6.5% higher than that of the topic model of TV users only. For TV user prediction with new TV programs, the average prediction precision was 79.6%. Also, we showed the superiority of our proposed model in terms of both topic modeling performance and recommendation performance compared to two related topic models such as polylingual topic model and bilingual topic model.

44 citations

Book ChapterDOI
03 Apr 2014
TL;DR: A plot-based recommendation system, based upon an evaluation of similarity between the plot of a video that was watched by a user and a large amount of plots stored in a movie database, which is able to propose famous and beloved movies as well as old or unheard movies/programs that are still strongly related to the content of the video the user has watched.
Abstract: We propose a plot-based recommendation system, which is based upon an evaluation of similarity between the plot of a video that was watched by a user and a large amount of plots stored in a movie database. Our system is independent from the number of user ratings, thus it is able to propose famous and beloved movies as well as old or unheard movies/programs that are still strongly related to the content of the video the user has watched. The system implements and compares the two Topic Models, Latent Semantic Allocation (LSA) and Latent Dirichlet Allocation (LDA), on a movie database of two hundred thousand plots that has been constructed by integrating different movie databases in a local NoSQL (MongoDB) DBMS. The topic models behaviour has been examined on the basis of standard metrics and user evaluations, performance assessments with 30 users to compare our tool with a commercial system have been conducted.

44 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
86% related
Support vector machine
73.6K papers, 1.7M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Feature extraction
111.8K papers, 2.1M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023323
2022850
2021420
2020429
2019473
2018447