Book ChapterDOI
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapReduce Capability
Kamlesh Kumar Pandey,Diwakar Shukla,Ram Milan +2 more
- pp 427-440
Reads0
Chats0
TLDR
This paper categorized clustering framework based on volume (dataset size, dimensional data), variety, and velocity (scalability, time complexity), and presented a common framework for scalable and speed-up any type of clustering algorithm with MapReduce capability and shown this Map reduce clustering Framework with the help of K-means algorithm.Abstract:
Big data mining is modern scientific research, which is used by all data related fields such as communication, computer, biology, geographical science, and so on. Basically, big data is related to volume, variety, velocity, variability, value, veracity, and visualization. Data mining technique is related to extract needed information, knowledge and hidden pattern, relations from large datasets with the heterogeneous format of data, which is collected by multiple sources. Data mining have classification, clustering, and association techniques for big data mining. Clustering is one of the approaches for mining, which is used for mine similar types of data, hidden patterns, and related data. All traditional clustering data mining approaches, such as partition, hierarchical, density, grid, and model-based algorithm, works on only high volume or high variety or high velocity. If we Apply the traditional clustering algorithms for big data mining then these algorithms will not work in the proper manner, and they need such clustering algorithms that work under high volume, high variety and high velocity. This paper presents the introduction to big data, big data mining, and traditional clustering algorithms concepts. From a theoretical, practical, and existing research perspective, this paper categorized clustering framework based on volume (dataset size, dimensional data), variety (dataset type, cluster shape), and velocity (scalability, time complexity), and presented a common framework for scalable and speed-up any type of clustering algorithm with MapReduce capability and shown this MapReduce clustering framework with the help of K-means algorithm.read more
Citations
More filters
Journal ArticleDOI
Maxmin Data Range Heuristic-Based Initial Centroid Method of Partitional Clustering for Big Data Mining
Book ChapterDOI
Approximate Partitional Clustering Through Systematic Sampling in Big Data Mining
TL;DR: The experimental evaluation of the SYK-means algorithm achieved better effectiveness and efficiency through R squares, root-mean-square standard deviation, Davies Bouldin, Calinski Harabasz, Silhouette coefficient, CPU time, and convergence validation indices.
Journal ArticleDOI
Real-time analysis and predictability of the health functional food market using big data.
TL;DR: In this paper, the authors conducted a real-time analysis of the health functional food market using big data, and the results demonstrate how APIs can be used to predict market size in the food industry effectively.
Journal ArticleDOI
Min max kurtosis distance based improved initial centroid selection approach of K-means clustering for big data mining on gene expression data
Journal ArticleDOI
NDPD: an improved initial centroid method of partitional clustering for big data mining
TL;DR: The experimental evaluation demonstrates that the NDPDKM algorithm reduces iterations, local optima, computing costs, and improves cluster performance, effectiveness, efficiency with stable convergence as compared to other algorithms.
References
More filters
Some methods for classification and analysis of multivariate observations
TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
Journal ArticleDOI
Data clustering: a review
TL;DR: An overview of pattern clustering methods from a statistical pattern recognition perspective is presented, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners.
Book ChapterDOI
A Survey of Clustering Data Mining Techniques
TL;DR: This survey concentrates on clustering algorithms from a data mining perspective as a data modeling technique that provides for concise summaries of the data.
Journal ArticleDOI
Beyond the hype
Amir H. Gandomi,Murtaza Haider +1 more
TL;DR: The need to develop appropriate and efficient analytical methods to leverage massive volumes of heterogeneous data in unstructured text, audio, and video formats is highlighted and the need to devise new tools for predictive analytics for structured big data is reinforced.
Journal ArticleDOI
Big Data: A Survey
Min Chen,Shiwen Mao,Yunhao Liu +2 more
TL;DR: The background and state-of-the-art of big data are reviewed, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid, as well as related technologies.