Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Learning the latent structure of collider events

[...]

Barry M. Dillon¹, Darius A. Faroughy², Jernej F. Kamenik³, Jernej F. Kamenik¹, Manuel Szewc⁴ - Show less +1 more•Institutions (4)

Jožef Stefan Institute¹, University of Zurich², University of Ljubljana³, National Scientific and Technical Research Council⁴

01 Oct 2020-Journal of High Energy Physics

TL;DR: In this paper, an unsupervised machine learning technique based on the probabilistic generative model of Latent Dirichlet Allocation is proposed to learn the underlying structure of collider events directly from the data.

...read moreread less

Abstract: We describe a technique to learn the underlying structure of collider events directly from the data, without having a particular theoretical model in mind. It allows to infer aspects of the theoretical model that may have given rise to this structure, and can be used to cluster or classify the events for analysis purposes. The unsupervised machine-learning technique is based on the probabilistic (Bayesian) generative model of Latent Dirichlet Allocation. We pair the model with an approximate inference algorithm called Variational Inference, which we then use to extract the latent probability distributions describing the learned underlying structure of collider events. We provide a detailed systematic study of the technique using two example scenarios to learn the latent structure of di-jet event samples made up of QCD background events and either $$ t\overline{t} $$ or hypothetical W′ → (ϕ → WW)W signal events.

...read moreread less

32 citations

Proceedings Article•DOI•

Unsupervised content-based indexing of sports video

[...]

Michael Fleischman¹, Deb Roy¹•Institutions (1)

Massachusetts Institute of Technology¹

24 Sep 2007

TL;DR: Experimental results indicate that using a grounded language model nearly doubles performance on a held out test set, extending a traditional language model based approach to information retrieval.

...read moreread less

Abstract: This paper presents a methodology for automatically indexing a large corpus of broadcast baseball games using an unsupervised content-based approach. The method relies on the learning of a grounded language model which maps query terms to the non-linguistic context to which they refer. Grounded language models are learned from a large, unlabeled corpus of video events. Events are represented using a codebook of automatically discovered temporal patterns of low level features extracted from the raw video. These patterns are associated with words extracted from the closed captioning text using a generalization of Latent Dirichlet Allocation. We evaluate the benefit of the grounded language model by extending a traditional language model based approach to information retrieval. Experimental results indicate that using a grounded language model nearly doubles performance on a held out test set.

...read moreread less

32 citations

Journal Article•DOI•

A Latent Analysis of Earth Surface Dynamic Evolution Using Change Map Time Series

[...]

Corina Vaduva¹, Teodor Costachioiu¹, Carmen Patrascu¹, Inge Gavat¹, Vasile Lazarescu¹, Mihai Datcu - Show less +2 more•Institutions (1)

Politehnica University of Bucharest¹

01 Apr 2013-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: A new perspective in dynamic classification for SITS is offered and several studies on forms of relief, weather forecast, and very high resolution images are used to explain the wide range of structures responsible for influencing the dynamic inside the resolution cell.

...read moreread less

Abstract: With a continuous increase in the number of Earth Observation satellites, leading to the development of satellite image time series (SITS), the number of algorithms for land cover analysis and monitoring has greatly expanded. This paper offers a new perspective in dynamic classification for SITS. Four similarity measures (correlation coefficient, Kullback-Leibler divergence, conditional information, and normalized compression distance) based on consecutive image pairs from the data are employed. These measures employ linear dependences, statistical measures, and spatial relationships to compute radiometric, spectral, and texture changes that offer a description for the multitemporal behavior of the SITS. During this process, the original SITS is converted to a change map time series (CMTS), which removes the static information from the data set. The CMTS is analyzed using a latent Dirichlet allocation (LDA) model capable of discovering classes with semantic meaning based on the latent information hidden in the scene. This statistical method was originally used for text classification, thus requiring a word, document, corpus analogy with the elements inside the image. The experimental results were computed using 11 Landsat images over the city of Bucharest and surrounding areas. The LDA model enables us to discover a wide range of scene evolution classes based on the various dynamic behaviors of the land cover. The results are compared with the Corinne Land Cover map. However, this is not a validation method but one that adds static knowledge about the general usage of the analyzed area. In order to help the interpretation of the results, we use several studies on forms of relief, weather forecast, and very high resolution images that can explain the wide range of structures responsible for influencing the dynamic inside the resolution cell.

...read moreread less

32 citations

Journal Article•DOI•

Bayesian Sparse Topic Model

[...]

Jen-Tzung Chien¹, Ying-Lan Chang¹•Institutions (1)

National Chiao Tung University¹

01 Mar 2014

TL;DR: The proposed sLDA does not only reduce the model perplexity but also reduce the memory and computation costs, and Bayesian feature selection method does effectively identify relevant topic words for building sparse topic model.

...read moreread less

Abstract: This paper presents a new Bayesian sparse learning approach to select salient lexical features for sparse topic modeling. The Bayesian learning based on latent Dirichlet allocation (LDA) is performed by incorporating the spike-and-slab priors. According to this sparse LDA (sLDA), the spike distribution is used to select salient words while the slab distribution is applied to establish the latent topic model based on those selected relevant words. The variational inference procedure is developed to estimate prior parameters for sLDA. In the experiments on document modeling using LDA and sLDA, we find that the proposed sLDA does not only reduce the model perplexity but also reduce the memory and computation costs. Bayesian feature selection method does effectively identify relevant topic words for building sparse topic model.

...read moreread less

32 citations

Journal Article•DOI•

Discovering latent activity patterns from transit smart card data: A spatiotemporal topic model

[...]

Zhan Zhao¹, Haris N. Koutsopoulos², Jinhua Zhao¹•Institutions (2)

Massachusetts Institute of Technology¹, Northeastern University²

01 Jul 2020-Transportation Research Part C-emerging Technologies

TL;DR: A probabilistic topic model is proposed, adapted from Latent Dirichlet Allocation (LDA), to discover representative and interpretable activity categorization from individual-level spatiotemporal data in an unsupervised manner and can successfully distinguish the three most basic types of activities.

...read moreread less

Abstract: Although automatically collected human travel records can accurately capture the time and location of human movements, they do not directly explain the hidden semantic structures behind the data, e.g., activity types. This work proposes a probabilistic topic model, adapted from Latent Dirichlet Allocation (LDA), to discover representative and interpretable activity categorization from individual-level spatiotemporal data in an unsupervised manner. Specifically, the activity-travel episodes of an individual user are treated as words in a document, and each topic is a distribution over space and time that corresponds to certain type of activity. The model accounts for a mixture of discrete and continuous attributes—the location, start time of day, start day of week, and duration of each activity episode. The proposed methodology is demonstrated using pseudonymized transit smart card data from London, U.K. The results show that the model can successfully distinguish the three most basic types of activities—home, work, and other. As the specified number of activity categories increases, more specific subpatterns for home and work emerge, and both the goodness of fit and predictive performance for travel behavior improve. This work makes it possible to enrich human mobility data with representative and interpretable activity patterns without relying on predefined activity categories or heuristic rules.

...read moreread less

32 citations

Collapse

Network Information

Performance

Metrics

6,513

Papers

245,225

Citations

No. of papers in the topic in previous years
Year	Papers
2023	323
2022	842
2021	418
2020	429
2019	473
2018	446

Latent Dirichlet allocation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics