A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapReduce Capability

doi:10.1007/978-981-15-2071-6_34

Book ChapterDOI

A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapReduce Capability

Kamlesh Kumar Pandey, +2 more

- pp 427-440

Chats0

TLDR

This paper categorized clustering framework based on volume (dataset size, dimensional data), variety, and velocity (scalability, time complexity), and presented a common framework for scalable and speed-up any type of clustering algorithm with MapReduce capability and shown this Map reduce clustering Framework with the help of K-means algorithm.

Abstract:

Big data mining is modern scientific research, which is used by all data related fields such as communication, computer, biology, geographical science, and so on. Basically, big data is related to volume, variety, velocity, variability, value, veracity, and visualization. Data mining technique is related to extract needed information, knowledge and hidden pattern, relations from large datasets with the heterogeneous format of data, which is collected by multiple sources. Data mining have classification, clustering, and association techniques for big data mining. Clustering is one of the approaches for mining, which is used for mine similar types of data, hidden patterns, and related data. All traditional clustering data mining approaches, such as partition, hierarchical, density, grid, and model-based algorithm, works on only high volume or high variety or high velocity. If we Apply the traditional clustering algorithms for big data mining then these algorithms will not work in the proper manner, and they need such clustering algorithms that work under high volume, high variety and high velocity. This paper presents the introduction to big data, big data mining, and traditional clustering algorithms concepts. From a theoretical, practical, and existing research perspective, this paper categorized clustering framework based on volume (dataset size, dimensional data), variety (dataset type, cluster shape), and velocity (scalability, time complexity), and presented a common framework for scalable and speed-up any type of clustering algorithm with MapReduce capability and shown this MapReduce clustering framework with the help of K-means algorithm.

A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapReduce Capability

Citations

Maxmin Data Range Heuristic-Based Initial Centroid Method of Partitional Clustering for Big Data Mining

Approximate Partitional Clustering Through Systematic Sampling in Big Data Mining

Real-time analysis and predictability of the health functional food market using big data.

Min max kurtosis distance based improved initial centroid selection approach of K-means clustering for big data mining on gene expression data

NDPD: an improved initial centroid method of partitional clustering for big data mining

References

Some methods for classification and analysis of multivariate observations

Data clustering: a review

A Survey of Clustering Data Mining Techniques

Beyond the hype

Big Data: A Survey

Related Papers (5)

A Survey of Clustering Techniques for Big Data Analysis

A novel clustering technique for efficient clustering of big data in Hadoop Ecosystem

A comprehensive study on clustering approaches for big data mining

Strategies for Big Data Clustering

Comprehensive Analysis & Performance Comparison of Clustering Algorithms for Big Data