scispace - formally typeset
Book ChapterDOI

Hadoop with Intuitionistic Fuzzy C-Means for Clustering in Big Data

TLDR
This paper proposes a new algorithm/technique of data clustering where Intuitionistic Fuzzy C-Means (IFCM) is used along with Hadoop to produce high-quality clusters and thereby making clustering on very large data more efficient.
Abstract
In recent days, industry and academia have been trying to address the data handling issues with respect to big data. This has led to development of new computing arenas in the fields of data mining and analysis of data which are the need of the hour. One of the techniques to handle large data is by making clusters of the similar data. But this technique is complex as well. This paper proposes a new algorithm/technique of data clustering where Intuitionistic Fuzzy C-Means (IFCM) is used along with Hadoop to produce high-quality clusters and thereby making clustering on very large data more efficient. The results of the proposed algorithm are demonstrated with the help of UCI data sets. Performance metrics like Accuracy, SSW, SSB, DB, DD, and SC indices are used for comparison of the obtained results with Parallel K-means (PKM) and modified Parallel K-means (MPKM).

read more

Citations
More filters
Journal ArticleDOI

Soft and Declarative Fishing of Information in Big Data Lake

TL;DR: It is shown how fuzzy techniques can be incorporated in big data analytics carried out with the declarative U-SQL language over a big data lake located on the cloud, and the solution directly addresses three characteristics of big data, i.e., volume, variety, and velocity, and indirectly addresses, veracity and value.
Journal ArticleDOI

Informational Paradigm, management of uncertainty and theoretical formalisms in the clustering framework

TL;DR: It is shown how all these clustering approaches are able of managing in different ways the uncertainty associated with the two components of the Informational Paradigm, i.e. the Empirical and Theoretical Information.
Posted Content

Informational Paradigm, Management of Uncertainty and Theoretical Formalisms in the Clustering Framework: a Review

Pierpaolo D'Urso
- 01 Nov 2017 - 
TL;DR: The first paper on clustering based on fuzzy sets theory was published in 1965 as mentioned in this paper, where L.A. Zadeh had published "Fuzzy Sets" and it has been 50 years since then.
Journal ArticleDOI

A Hopping Umbrella for Fuzzy Joining Data Streams From IoT Devices in the Cloud and on the Edge

TL;DR: A hopping umbrella which fuzzifies timestamps from sensor readings while joining data streams from asynchronous IoT devices in a flexible way is presented, able to properly join the best matching sensor readings and in some scenarios, reduce the number of data transferred to the Cloud data center without significant overhead in resource utilization of stream processing units.
Book ChapterDOI

Uncertainty-Based Clustering Algorithms for Large Data Sets

TL;DR: It is the aim in this chapter to present the uncertainty based clustering algorithms developed so far and proposes a few new algorithms which can be developed further.
References
More filters
Book

Fuzzy sets

TL;DR: A separation theorem for convex fuzzy sets is proved without requiring that the fuzzy sets be disjoint.
Journal ArticleDOI

Intuitionistic fuzzy sets

TL;DR: Various properties are proved, which are connected to the operations and relations over sets, and with modal and topological operators, defined over the set of IFS's.
Book

Finding Groups in Data: An Introduction to Cluster Analysis

TL;DR: An electrical signal transmission system, applicable to the transmission of signals from trackside hot box detector equipment for railroad locomotives and rolling stock, wherein a basic pulse train is transmitted whereof the pulses are of a selected first amplitude and represent a train axle count.
BookDOI

Finding Groups in Data

TL;DR: In this article, an electrical signal transmission system for railway locomotives and rolling stock is proposed, where a basic pulse train is transmitted whereof the pulses are of a selected first amplitude and represent a train axle count, and a spike pulse of greater selected amplitude is transmitted, occurring immediately after the axle count pulse to which it relates, whenever an overheated axle box is detected.
Journal ArticleDOI

A Cluster Separation Measure

TL;DR: A measure is presented which indicates the similarity of clusters which are assumed to have a data density which is a decreasing function of distance from a vector characteristic of the cluster which can be used to infer the appropriateness of data partitions.
Related Papers (5)