scispace - formally typeset
Book ChapterDOI

A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapReduce Capability

Kamlesh Kumar Pandey, +2 more
- pp 427-440
Reads0
Chats0
TLDR
This paper categorized clustering framework based on volume (dataset size, dimensional data), variety, and velocity (scalability, time complexity), and presented a common framework for scalable and speed-up any type of clustering algorithm with MapReduce capability and shown this Map reduce clustering Framework with the help of K-means algorithm.
Abstract
Big data mining is modern scientific research, which is used by all data related fields such as communication, computer, biology, geographical science, and so on. Basically, big data is related to volume, variety, velocity, variability, value, veracity, and visualization. Data mining technique is related to extract needed information, knowledge and hidden pattern, relations from large datasets with the heterogeneous format of data, which is collected by multiple sources. Data mining have classification, clustering, and association techniques for big data mining. Clustering is one of the approaches for mining, which is used for mine similar types of data, hidden patterns, and related data. All traditional clustering data mining approaches, such as partition, hierarchical, density, grid, and model-based algorithm, works on only high volume or high variety or high velocity. If we Apply the traditional clustering algorithms for big data mining then these algorithms will not work in the proper manner, and they need such clustering algorithms that work under high volume, high variety and high velocity. This paper presents the introduction to big data, big data mining, and traditional clustering algorithms concepts. From a theoretical, practical, and existing research perspective, this paper categorized clustering framework based on volume (dataset size, dimensional data), variety (dataset type, cluster shape), and velocity (scalability, time complexity), and presented a common framework for scalable and speed-up any type of clustering algorithm with MapReduce capability and shown this MapReduce clustering framework with the help of K-means algorithm.

read more

Citations
More filters
Book ChapterDOI

Approximate Partitional Clustering Through Systematic Sampling in Big Data Mining

TL;DR: The experimental evaluation of the SYK-means algorithm achieved better effectiveness and efficiency through R squares, root-mean-square standard deviation, Davies Bouldin, Calinski Harabasz, Silhouette coefficient, CPU time, and convergence validation indices.
Journal ArticleDOI

Real-time analysis and predictability of the health functional food market using big data.

TL;DR: In this paper, the authors conducted a real-time analysis of the health functional food market using big data, and the results demonstrate how APIs can be used to predict market size in the food industry effectively.
Journal ArticleDOI

NDPD: an improved initial centroid method of partitional clustering for big data mining

TL;DR: The experimental evaluation demonstrates that the NDPDKM algorithm reduces iterations, local optima, computing costs, and improves cluster performance, effectiveness, efficiency with stable convergence as compared to other algorithms.
References
More filters

Some methods for classification and analysis of multivariate observations

TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
Journal ArticleDOI

Data clustering: a review

TL;DR: An overview of pattern clustering methods from a statistical pattern recognition perspective is presented, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners.
Book ChapterDOI

A Survey of Clustering Data Mining Techniques

TL;DR: This survey concentrates on clustering algorithms from a data mining perspective as a data modeling technique that provides for concise summaries of the data.
Journal ArticleDOI

Beyond the hype

TL;DR: The need to develop appropriate and efficient analytical methods to leverage massive volumes of heterogeneous data in unstructured text, audio, and video formats is highlighted and the need to devise new tools for predictive analytics for structured big data is reinforced.
Journal ArticleDOI

Big Data: A Survey

TL;DR: The background and state-of-the-art of big data are reviewed, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid, as well as related technologies.