scispace - formally typeset
Search or ask a question
Topic

Latent variable model

About: Latent variable model is a research topic. Over the lifetime, 3589 publications have been published within this topic receiving 235061 citations.


Papers
More filters
Proceedings ArticleDOI
25 Jul 2010
TL;DR: A novel topic model using the Pitman-Yor(PY) process is proposed, called the PY topic model, which captures two properties of a document; a power-law word distribution and the presence of multiple topics.
Abstract: One important approach for knowledge discovery and data mining is to estimate unobserved variables because latent variables can indicate hidden specific properties of observed data. The latent factor model assumes that each item in a record has a latent factor; the co-occurrence of items can then be modeled by latent factors. In document modeling, a record indicates a document represented as a "bag of words," meaning that the order of words is ignored, an item indicates a word and a latent factor indicates a topic. Latent Dirichlet allocation (LDA) is a widely used Bayesian topic model applying the Dirichlet distribution over the latent topic distribution of a document having multiple topics. LDA assumes that latent topics, i.e., discrete latent variables, are distributed according to a multinomial distribution whose parameters are generated from the Dirichlet distribution. LDA also models a word distribution by using a multinomial distribution whose parameters follows the Dirichlet distribution. This Dirichlet-multinomial setting, however, cannot capture the power-law phenomenon of a word distribution, which is known as Zipf's law in linguistics. We therefore propose a novel topic model using the Pitman-Yor(PY) process, called the PY topic model. The PY topic model captures two properties of a document; a power-law word distribution and the presence of multiple topics. In an experiment using real data, this model outperformed LDA in document modeling in terms of perplexity.

77 citations

Posted Content
TL;DR: A hierarchical latent variable model that individualizes predictions of disease trajectories is proposed and validated on the task of predicting the course of interstitial lung disease, a leading cause of death among patients with the autoimmune disease scleroderma.
Abstract: For many complex diseases, there is a wide variety of ways in which an individual can manifest the disease. The challenge of personalized medicine is to develop tools that can accurately predict the trajectory of an individual's disease, which can in turn enable clinicians to optimize treatments. We represent an individual's disease trajectory as a continuous-valued continuous-time function describing the severity of the disease over time. We propose a hierarchical latent variable model that individualizes predictions of disease trajectories. This model shares statistical strength across observations at different resolutions--the population, subpopulation and the individual level. We describe an algorithm for learning population and subpopulation parameters offline, and an online procedure for dynamically learning individual-specific parameters. Finally, we validate our model on the task of predicting the course of interstitial lung disease, a leading cause of death among patients with the autoimmune disease scleroderma. We compare our approach against state-of-the-art and demonstrate significant improvements in predictive accuracy.

77 citations

Journal ArticleDOI
TL;DR: Findings suggest that the latent structural nature of AS can be conceptualized as a taxonic latent class structure composed of 2 types or forms of AS, each of these forms characterized by its own unique latent continuity and dimensional structure.
Abstract: This study represents an effort to better understand the latent structure of anxiety sensitivity (AS), as indexed by the 16-item Anxiety Sensitivity Index (ASI; S. Reiss, R. A. Peterson, M. Gursky, & R. J. McNally, 1986), by using taxometric and factor-analytic approaches in an integrative manner. Taxometric analyses indicated that AS has a taxonic latent class structure (i.e., a dichotomous latent class structure) in a large sample of North American adults (N=2,515). As predicted, confirmatory factor analyses indicated that a multidimensional 3-factor model of AS provided a good fit for the AS complement class (normative or low-risk form) but not the AS taxon class (high-risk form). Exploratory factor analytic results suggested that the AS taxon may demonstrate a unique, unidimensional factor solution, though there are alternative indications that it may be characterized by a 2-factor solution. Findings suggest that the latent structural nature of AS can be conceptualized as a taxonic latent class structure composed of 2 types or forms of AS, each of these forms characterized by its own unique latent continuity and dimensional structure.

77 citations

Journal ArticleDOI
TL;DR: A unified rank‐based approach to estimate the correlation matrix of latent variables is proposed and the concentration inequality of the proposed rank-based estimator is established, which achieves the same rates of convergence for precision matrix estimation and graph recovery as if the latent variables were observed.
Abstract: Summary We propose a semiparametric latent Gaussian copula model for modelling mixed multivariate data, which contain a combination of both continuous and binary variables. The model assumes that the observed binary variables are obtained by dichotomizing latent variables that satisfy the Gaussian copula distribution. The goal is to infer the conditional independence relationship between the latent random variables, based on the observed mixed data. Our work has two main contributions: we propose a unified rank-based approach to estimate the correlation matrix of latent variables; we establish the concentration inequality of the proposed rank-based estimator. Consequently, our methods achieve the same rates of convergence for precision matrix estimation and graph recovery, as if the latent variables were observed. The methods proposed are numerically assessed through extensive simulation studies, and real data analysis.

76 citations

MonographDOI
01 Jan 2002
TL;DR: This paper presents a meta-modelling framework for estimating confidence levels in the modeled environments of Hierarchically Related Nonparametric IRT Models and Practical Data Analysis Methods.
Abstract: Contents: Preface. D.J. Bartholomew, Old and New Approaches to Latent Variable Modelling. I. Moustaki, C. O'Muircheartaigh, Locating "Don't Know," "No Answer" and Middle Alternatives on an Attitude Scale: A Latent Variable Approach. L.A. van der Ark, B.T. Hemker, K. Sijtsma, Hierarchically Related Nonparametric IRT Models, and Practical Data Analysis Methods. P. Tzamourani, M. Knott, Fully Semiparametric Estimation of the Two-Parameter Latent Trait Model for Binary Data. P. Rivera, A. Satorra, Analyzing Group Differences: A Comparison of SEM Approaches. R.D. Wiggins, A. Sacker, Strategies for Handling Missing Data in SEM: A User's Perspective. T. Raykov, S. Penev, Exploring Structural Equation Model Misspecifications Via Latent Individual Residuals. J-Q. Shi, S-Y. Lee, B-C. Wei, On Confidence Regions of SEM Models. P. Filzmoser, Robust Factor Analysis: Methods and Applications. M. Croon, Using Predicted Latent Scores in General Latent Structure Models. H. Goldstein, W. Browne, Multilevel Factor Analysis Modelling Using Markov Chain Monte Carlo Estimation. J-P. Fox, C.A.W. Glas, Modelling Measurement Error in Structural Multilevel Models.

76 citations


Network Information
Related Topics (5)
Statistical hypothesis testing
19.5K papers, 1M citations
82% related
Inference
36.8K papers, 1.3M citations
81% related
Multivariate statistics
18.4K papers, 1M citations
80% related
Linear model
19K papers, 1M citations
80% related
Estimator
97.3K papers, 2.6M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202375
2022143
2021137
2020185
2019142
2018159