scispace - formally typeset
Proceedings ArticleDOI

A parallel Probabilistic Latent Semantic Analysis method on MapReduce platform

Zhao Liang, +2 more
- pp 1017-1022
TLDR
The empirical experiment results show that when the training dataset is large, learning the probability distributions of PLSA model in a parallel way can achieve almost linear speedups and thus provides a practical solution to large-scale data analysis applications.
Abstract
Probabilistic Latent Semantic Analysis (PLSA) is a powerful statistical technique to analyze relation between co-occurrence data, and has wide usages in automated information processing tasks. However it involves non-trivial computation and is often difficult and time-consuming to train when the dataset is big. MapReduce is a computing framework designed by Google which aims to provide a distributed solution to practically large-scale data analysis tasks using clusters of computers. In this work, we addressed the scalability problem of PLSA by proposing and implementing a parallel method to train PLSA under the MapReduce computing framework. The empirical experiment results show that when the training dataset is large, learning the probability distributions of PLSA model in a parallel way can achieve almost linear speedups and thus provides a practical solution to large-scale data analysis applications.

read more

Citations
More filters
Journal ArticleDOI

Detecting shilling attacks in recommender systems based on analysis of user rating behavior

TL;DR: This work proposes a novel unsupervised shilling attack detection model based on an analysis of user rating behavior that measures the diversity and memory of users’ interest preferences by entropy and block entropy, respectively, and analyzes the memory ofuser rating preferences by a self-correlation analysis.
Journal ArticleDOI

Empirical study using network of semantically related associations in bridging the knowledge gap

TL;DR: An integrated system, such as ARIANA, could assist the human expert in exploratory literature search by bringing forward hidden associations, promoting data reuse and knowledge discovery as well as stimulating interdisciplinary projects by connecting information across the disciplines.
Journal ArticleDOI

GPU Parallel Implementation of Dual-Depth Sparse Probabilistic Latent Semantic Analysis for Hyperspectral Unmixing

TL;DR: The experimental results show that the parallel versions of the DEpLSA and the traditional pLSA approach can provide accurate HU results fast enough for practical use, accelerating the corresponding serial versions in at least 30x in the GTX 1080 and up to 147X in the Tesla P100 GPU, which are quite significant acceleration factors that increase with the image size, thus allowing for the possibility of the fast processing of massive HS data repositories.
Journal ArticleDOI

An Efficient Distributed Algorithm for Big Data Processing

TL;DR: Simulation results demonstrate that the proposed distributed algorithm outperforms traditional distributed algorithms in terms of the size of data to be processed at the central server and data processing time.
Book ChapterDOI

Semantic Model for Web-Based Big Data Using Ontology and Fuzzy Rule Mining

TL;DR: A novel approach for semantic analysis with web based big data using rule based ontology mapping using fuzzy rule based resource representation and a refined semantic relation reasoning mining is applied to obtain overall knowledge representation.
References
More filters
Journal ArticleDOI

Optimization by Simulated Annealing

TL;DR: There is a deep and useful connection between statistical mechanics and multivariate or combinatorial optimization (finding the minimum of a given function depending on many parameters), and a detailed analogy with annealing in solids provides a framework for optimization of very large and complex systems.
Journal ArticleDOI

Support-Vector Networks

TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Journal ArticleDOI

Latent dirichlet allocation

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article

Latent Dirichlet Allocation

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Related Papers (5)