scispace - formally typeset
Search or ask a question
Book ChapterDOI

Big Data Analytics-Based Recommendation System Using Ensemble Model

TL;DR: In this article , the authors compared two distinct recommendation frameworks: a single algorithm and an ensemble algorithm model, and found that the ensemble algorithm-based recommendation engine has proven to provide better recommendations in comparison to individual algorithms.
Abstract: In the realm of computer science, RSS is a set of tools and methods for making useful product recommendations to end users. To maintain footholds in competitive industry, telecoms provide a wide range of offerings. It is challenging for a client to choose the best-fit product from the huge bouquet of products available. It is possible to increase suggestion quality by using the large amounts of textual contextual data detailing item qualities which are accessible with rating data in various recommender’s domains. Users have a hard time making purchases in the telecom industry. Here, fresh strategy for improving recommendation systems in the telecommunications industry is proposed. Users may choose the recommended services which is loaded onto their devices. Using a recommendation engine is a simple way for telecoms to increase trust and customer satisfaction index. The suggested recommendation engine allows users to pick and choose services they need. The present study compared two distinct recommendation frameworks: a single algorithm and an ensemble algorithm model. Experiments were conducted to compare the efficacy of separate algorithms and ensemble algorithm. Interestingly, the ensemble algorithm-based recommendation engine has proven to provide better recommendations in comparison to individual algorithms.
References
More filters
Journal ArticleDOI
TL;DR: An overview of pattern clustering methods from a statistical pattern recognition perspective is presented, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners.
Abstract: Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult problem combinatorially, and differences in assumptions and contexts in different communities has made the transfer of useful generic concepts and methodologies slow to occur. This paper presents an overview of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques, and identify cross-cutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval.

14,054 citations

Journal ArticleDOI
TL;DR: This paper divides cluster analysis for gene expression data into three categories, presents specific challenges pertinent to each clustering category and introduces several representative approaches, and suggests the promising trends in this field.
Abstract: DNA microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increases the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. A first step toward addressing this challenge is the use of clustering techniques, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. Cluster analysis seeks to partition a given data set into groups based on specified features so that the data points within a group are more similar to each other than the points in different groups. A very rich literature on cluster analysis has developed over the past three decades. Many conventional clustering algorithms have been adapted or directly applied to gene expression data, and also new algorithms have recently been proposed specifically aiming at gene expression data. These clustering algorithms have been proven useful for identifying biologically relevant groups of genes and samples. In this paper, we first briefly introduce the concepts of microarray technology and discuss the basic elements of clustering on gene expression data. In particular, we divide cluster analysis for gene expression data into three categories. Then, we present specific challenges pertinent to each clustering category and introduce several representative approaches. We also discuss the problem of cluster validation in three aspects and review various methods to assess the quality and reliability of clustering results. Finally, we conclude this paper and suggest the promising trends in this field.

1,291 citations

Journal ArticleDOI
TL;DR: This article compares 14 distance measures and their modifications between feature vectors with respect to the recognition performance of the principal component analysis (PCA)-based face recognition method and proposes modified sum square error (SSE)-based distance.
Abstract: In this article we compare 14 distance measures and their modifications between feature vectors with respect to the recognition performance of the principal component analysis (PCA)-based face recognition method and propose modified sum square error (SSE)-based distance. Recognition experiments were performed using the database containing photographies of 423 persons. The experiments showed, that the proposed distance measure was among the first three best measures with respect to different characteristics of the biometric systems. The best recognition results were achieved using the following distance measures: simplified Mahalanobis, weighted angle-based distance, proposed modified SSE-based distance, angle-based distance between whitened feature vectors. Using modified SSE-based distance we need to extract less images in order to achieve 100% cumulative recognition than using any other tested distance measure. We also showed that using the algorithmic combination of distance measures we can achieve better recognition results than using the distances separately.

312 citations

Journal ArticleDOI
11 Dec 2015-PLOS ONE
TL;DR: A technical framework is proposed to analyze, compare and benchmark the influence of different similarity measures on the results of distance-based clustering algorithms and should help the research community to identify suitable distance measures for datasets and also to facilitate a comparison and evaluation of the newly proposed similarity or distance measures with traditional ones.
Abstract: Similarity or distance measures are core components used by distance-based clustering algorithms to cluster similar data points into the same clusters, while dissimilar or distant data points are placed into different clusters. The performance of similarity measures is mostly addressed in two or three-dimensional spaces, beyond which, to the best of our knowledge, there is no empirical study that has revealed the behavior of similarity measures when dealing with high-dimensional datasets. To fill this gap, a technical framework is proposed in this study to analyze, compare and benchmark the influence of different similarity measures on the results of distance-based clustering algorithms. For reproducibility purposes, fifteen publicly available datasets were used for this study, and consequently, future distance measures can be evaluated and compared with the results of the measures discussed in this work. These datasets were classified as low and high-dimensional categories to study the performance of each measure against each category. This research should help the research community to identify suitable distance measures for datasets and also to facilitate a comparison and evaluation of the newly proposed similarity or distance measures with traditional ones.

283 citations

Journal ArticleDOI
TL;DR: A new metric is presented which combines the numerical information of the votes with independent information from those values, based on the proportions of the common and uncommon votes between each pair of users, which is superior to the traditional levels of accuracy.
Abstract: Recommender systems are typically provided as Web 20 services and are part of the range of applications that give support to large-scale social networks, enabling on-line recommendations to be made based on the use of networked databases The operating core of recommender systems is based on the collaborative filtering stage, which, in current user to user recommender processes, usually uses the Pearson correlation metric In this paper, we present a new metric which combines the numerical information of the votes with independent information from those values, based on the proportions of the common and uncommon votes between each pair of users Likewise, we define the reasoning and experiments on which the design of the metric is based and the restriction of being applied to recommender systems where the possible range of votes is not greater than 5 In order to demonstrate the superior nature of the proposed metric, we provide the comparative results of a set of experiments based on the MovieLens, FilmAffinity and NetFlix databases In addition to the traditional levels of accuracy, results are also provided on the metrics' coverage, the percentage of hits obtained and the precision/recall

277 citations