Showing papers on "Latent Dirichlet allocation published in 1999"

PDF

Open Access

Journal Article•DOI•

[...]

Thomas Hofmann¹•Institutions (1)

International Computer Science Institute¹

01 Aug 1999

TL;DR: Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data.

...read moreread less

Abstract: Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a training corpus of text documents by a generalization of the Expectation Maximization algorithm, the utilized model is able to deal with domain{specific synonymy as well as with polysemous words. In contrast to standard Latent Semantic Indexing (LSI) by Singular Value Decomposition, the probabilistic variant has a solid statistical foundation and defines a proper generative data model. Retrieval experiments on a number of test collections indicate substantial performance gains over direct term matching methods as well as over LSI. In particular, the combination of models with different dimensionalities has proven to be advantageous.

...read moreread less

4,577 citations

Proceedings Article•

Probabilistic latent semantic analysis

[...]

Thomas Hofmann¹•Institutions (1)

International Computer Science Institute¹

30 Jul 1999

TL;DR: This work proposes a widely applicable generalization of maximum likelihood model fitting by tempered EM, based on a mixture decomposition derived from a latent class model which results in a more principled approach which has a solid foundation in statistics.

...read moreread less

Abstract: Probabilistic Latent Semantic Analysis is a novel statistical technique for the analysis of two-mode and co-occurrence data, which has applications in information retrieval and filtering, natural language processing, machine learning from text, and in related areas. Compared to standard Latent Semantic Analysis which stems from linear algebra and performs a Singular Value Decomposition of co-occurrence tables, the proposed method is based on a mixture decomposition derived from a latent class model. This results in a more principled approach which has a solid foundation in statistics. In order to avoid overfitting, we propose a widely applicable generalization of maximum likelihood model fitting by tempered EM. Our approach yields substantial and consistent improvements over Latent Semantic Analysis in a number of experiments.

...read moreread less

2,306 citations

Proceedings Article•DOI•

[...]

Chris Ding¹•Institutions (1)

Lawrence Berkeley National Laboratory¹

01 Aug 1999

TL;DR: A dual probability model is constructed for the Latent Semantic Indexing using the cosine similarity measure, establishing a statistical framework for LSI and leading to a statistical criterion for the optimal semantic dimensions.

...read moreread less

Abstract: A dual probability model is constructed for the Latent Semantic Indexing (LSI) using the cosine similarity measure. Both the document-document similarity matrix and the term-term similarity matrix naturally arise from the maximum likelihood estimation of the model parameters, and the optimal solutions are the latent semantic vectors of of LSI. Dimensionality reduction is justi ed by the statistical signi cance of latent semantic vectors as measured by the likelihood of the model. This leads to a statistical criterion for the optimal semantic dimensions, answering a critical open question in LSI with practical importance. Thus the model establishes a statistical framework for LSI. Ambiguities related to statistical modeling of LSI are clari ed.

...read moreread less

152 citations

Journal Article•DOI•

Probit Latent Class Analysis with Dichotomous or Ordered Category Measures: Conditional Independence/Dependence Models.

[...]

John S. Uebersax

01 Dec 1999-Applied Psychological Measurement

TL;DR: In this article, flexible methods that relax restrictive conditional independence assumptions of latent class analysis (LCA) are described, and the relationship between the multivariate probit mixture model proposed here and Rost's mixed Rasch (1990, 1991) model is discussed.

...read moreread less

Abstract: Flexible methods that relax restrictive conditional independence assumptions of latent classanalysis (LCA) are described. Dichotomous and ordered category manifest variables are viewed asdiscretized latent continuous variables. The latent continuous variables are assumed to have a mixtureofmultivariate-normals distribution. Within a latent class, conditional dependence is modeled as the mutual association of all or some latent continuous variables with a continuous latent trait (or in special cases, multiple latent traits). The relaxation of conditional independence assumptions allows LCA to better model natural taxa. Comparisons of specific restricted and unrestricted models permit statistical tests of specific aspects of latent taxonic structure. Latent class, latent trait, and latent distribution analysis can be viewed as special cases of the mixed latent trait model. The relationship between the multivariate probit mixture model proposed here and Rost’s mixed Rasch (1990, 1991) model is discussed. Two...

...read moreread less

96 citations

Journal Article•DOI•

Semi- and Non-parametric Bayesian Analysis of Duration Models with Dirichlet Priors: A Survey

[...]

Jean-Pierre Florens, Michel Mouchart, Jean-Marie Rolin¹•Institutions (1)

Université catholique de Louvain¹

01 Aug 1999-International Statistical Review

TL;DR: In this paper, the main results obtained in semi-and non-parametric Bayesian analysis of duration models are reviewed in line with Ferguson's pioneering papers, and a Bayesian semiparametric version of the proportional hazards model is considered.

...read moreread less

Abstract: The object of this paper is to review the main results obtained in semi- and non-parametric Bayesian analysis of duration models. Standard nonparametric Bayesian models for independent and identically distributed observations are reviewed in line with Ferguson's pioneering papers. Recent results on the characterization of Dirichlet processes and on nonparametric treatment of censoring and of heterogeneity in the context of mixtures of Dirichlet processes are also discussed. The final section considers a Bayesian semiparametric version of the proportional hazards model.

...read moreread less

16 citations

Proceedings Article•

Implicative Analysis for Multivariate Binary Data using an Imprecise Dirichlet Model.

[...]

Jean-Marc Bernard¹•Institutions (1)

University of Paris¹

01 Jan 1999

TL;DR: This model is shown to have several advantages over the Bayesian models based on a single Dirichlet prior, especially when 2 q is large and many patterns are thus unobserved by design.

...read moreread less

Abstract: Bayesian implicative analysis was proposed for summarizing the association in a 2×2 contingency table in possibly asymmetrical terms such as “presence of feature a implies, usually, presence of feature b ” (“ a quasi-implies b ” in short). Here, we consider the multivariate version of this problem: having n units which are classified according to q binary questions, we want to summarize the association between questions in terms of quasi-implications between features. We will first show how, at a descriptive level, the notion of implication can be weakened into that of quasi-implication. The inductive step assumes that the n units are a sample from a 2 q -multinomial population. Uncertainty about the patterns’ true frequencies is expressed by an imprecise Dirichlet model which yields upper and lower posterior probabilities for any quasi-implicative statement. This model is shown to have several advantages over the Bayesian models based on a single Dirichlet prior, especially when 2 q is large and many patterns are thus unobserved by design.

...read moreread less

2 citations

Some results on Bayes factors in a nonparametric contest

[...]

C. Carota

01 Jan 1999

1 citations