scispace - formally typeset
Journal ArticleDOI

A k-mean clustering algorithm for mixed numeric and categorical data

Reads0
Chats0
TLDR
A clustering algorithm based on k-mean paradigm that works well for data with mixed numeric and categorical features is presented and a new cost function and distance measure based on co-occurrence of values is proposed.
Abstract
Use of traditional k-mean type algorithm is limited to numeric data. This paper presents a clustering algorithm based on k-mean paradigm that works well for data with mixed numeric and categorical features. We propose new cost function and distance measure based on co-occurrence of values. The measures also take into account the significance of an attribute towards the clustering process. We present a modified description of cluster center to overcome the numeric data only limitation of k-mean algorithm and provide a better characterization of clusters. The performance of this algorithm has been studied on real world data sets. Comparisons with other clustering algorithms illustrate the effectiveness of this approach.

read more

Citations
More filters
Proceedings ArticleDOI

Road crack detection using deep convolutional neural network

TL;DR: Quantitative evaluation conducted on a data set of 500 images of size 3264 χ 2448, collected by a low-cost smart phone, demonstrates that the learned deep features with the proposed deep learning framework provide superior crack detection performance when compared with features extracted with existing hand-craft methods.
Journal ArticleDOI

A Detailed Investigation and Analysis of Using Machine Learning Techniques for Intrusion Detection

TL;DR: A detailed investigation and analysis of various machine learning techniques have been carried out for finding the cause of problems associated with variousMachine learning techniques in detecting intrusive activities and future directions are provided for attack detection using machinelearning techniques.
Journal ArticleDOI

The k-means Algorithm: A Comprehensive Survey and Performance Evaluation

TL;DR: Variants of the k-means algorithms including their recent developments are discussed, where their effectiveness is investigated based on the experimental analysis of a variety of datasets.
Journal ArticleDOI

Machine learning and data mining in manufacturing

TL;DR: A comprehensive literature review is presented to provide an overview of how machine learning techniques can be applied to realize manufacturing mechanisms with intelligent actions and points to several significant research questions that are unanswered in the recent literature having the same target.
Journal ArticleDOI

A hybrid clustering technique combining a novel genetic algorithm with K-Means

TL;DR: A novel GA based clustering technique that is capable of automatically finding the right number of clusters and identifying the right genes through a novel initial population selection approach is proposed and with the help of its novel fitness function, and gene rearrangement operation it produces high quality cluster centers.
References
More filters

Some methods for classification and analysis of multivariate observations

TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
Book

Data Mining: Practical Machine Learning Tools and Techniques

TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
Proceedings Article

A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise

TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.
Journal ArticleDOI

Cluster analysis and display of genome-wide expression patterns

TL;DR: A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression, finding in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function.
Book

Pattern Recognition with Fuzzy Objective Function Algorithms

TL;DR: Books, as a source that may involve the facts, opinion, literature, religion, and many others are the great friends to join with, becomes what you need to get.