scispace - formally typeset
Search or ask a question
Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.


Papers
More filters
Journal ArticleDOI
TL;DR: Experimental results prove that the proposed latent feature-based transfer learning (TL) strategy has a significant advantage over gear fault diagnosis, especially under varying working conditions.
Abstract: Gears are often operated under various working conditions, which may cause the training and testing data have different but related distributions when conducting gear fault diagnosis. To address this issue, a latent feature-based transfer learning (TL) strategy is proposed in this paper. First, the bag-of-fault-words (BOFW) model combined with the continuous wavelet transform (CWT) method is developed to extract and represent every fault feature parameter as a histogram. Before identifying the gear fault, the latent feature-based TL strategy is carried out, which adopts the joint dual-probabilistic latent semantic analysis (JD-PLSA) to model the shared and domain-specific latent features. After that, a mapping matrix between two domains can be constructed by using Pearson’s correlation coefficients (PCCs) to effectively transfer shared and mapped domain specific latent knowledge and to reduce the gap between two domains. Then, a Fisher kernel-based support vector machine (FSVM) is used to identify the gear fault types. To verify the effectiveness of the proposed approach, gear data sets gathered from Spectra Quest’s drivetrain dynamics simulator (DDS) are analyzed. Experimental results prove that the proposed approach has a significant advantage over gear fault diagnosis, especially under varying working conditions.

17 citations

Journal Article
TL;DR: In this paper, distribution-based functions for the errors in the estimation of the latent variables were derived for both the maximum likelihood and the Bayes methods, and the asymptotic behavior of both the methods was analyzed.
Abstract: Hierarchical statistical models are widely employed in information science and data engineering. The models consist of two types of variables: observable variables that represent the given data and latent variables for the unobservable labels. An asymptotic analysis of the models plays an important role in evaluating the learning process; the result of the analysis is applied not only to theoretical but also to practical situations, such as optimal model selection and active learning. There are many studies of generalization errors, which measure the prediction accuracy of the observable variables. However, the accuracy of estimating the latent variables has not yet been elucidated. For a quantitative evaluation of this, the present paper formulates distribution-based functions for the errors in the estimation of the latent variables. The asymptotic behavior is analyzed for both the maximum likelihood and the Bayes methods.

17 citations

Journal ArticleDOI
TL;DR: This paper presents a novel semantic smoothing method named Higher-Order Smoothing (HOS) for the Naive Bayes algorithm, built on a similar graph based data representation of the HONB which allows semantics in higher-order paths to be exploited.
Abstract: It is known that latent semantic indexing (LSI) takes advantage of implicit higher-order (or latent) structure in the association of terms and documents. Higher-order relations in LSI capture “latent semantics”. These findings have inspired a novel Bayesian framework for classification named Higher-Order Naive Bayes (HONB), which was introduced previously, that can explicitly make use of these higher-order relations. In this paper, we present a novel semantic smoothing method named Higher-Order Smoothing (HOS) for the Naive Bayes algorithm. HOS is built on a similar graph based data representation of the HONB which allows semantics in higher-order paths to be exploited. We take the concept one step further in HOS and exploit the relationships between instances of different classes. As a result, we move beyond not only instance boundaries, but also class boundaries to exploit the latent information in higher-order paths. This approach improves the parameter estimation when dealing with insufficient labeled data. Results of our extensive experiments demonstrate the value of HOS on several benchmark datasets.

17 citations

Proceedings Article
01 Jan 2010
TL;DR: This paper proposes a method that encourages sparsity, by adding regularization constraints on the searched distributions, which can be used with most topic models and lead to a simple modified version of the EM standard optimization procedure.
Abstract: We address the mining of sequential activity patterns from document logs given as word-time occurrences. We achieve this using topics that models both the cooccurrence and the temporal order in which words occur within a temporal window. Discovering such topics, which is particularly hard when multiple activities can occur simultaneously, is conducted through the joint inference of the temporal topics and of their starting times, allowing the implicit alignment of the same activity occurences in the document. A current issue is that while we would like topic starting times to be represented by sparse distributions, this is not achieved in practice. Thus, in this paper, we propose a method that encourages sparsity, by adding regularization constraints on the searched distributions. The constraints can be used with most topic models (e.g. PLSA, LDA) and lead to a simple modified version of the EM standard optimization procedure. The effect of the sparsity constraint on our activity model and the robustness improvment in the presence of difference noises have been validated on synthetic data. Its effectiveness is also illustrated in video activity analysis, where the discovered topics capture frequent patterns that implicitly represent typical trajectories of scene objects.

17 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Support vector machine
73.6K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202319
202277
202114
202036
201927
201858