scispace - formally typeset
Search or ask a question
Journal ArticleDOI

An Improved Algorithm of Rough K-Means Clustering Based on Variable Weighted Distance Measure

TL;DR: An improved algorithm of rough k-means clustering based on variable weighted distance measure is presented and Comparative experimental results of real world data from UCI demonstrate the validity of the proposed algorithm.
Abstract: Rough K-means algorithm has shown that it can provides a reasonable set of lower and upper bounds for a given dataset. With the conceptions of the lower and upper approximate sets, rough k-means clustering and its emerging derivatives become valid algorithms in vague information clustering. However, the most available algorithms ignore the difference of the distances between data objects and cluster centers when computing new mean for each cluster. To solve this issue, an improved algorithm of rough k-means clustering based on variable weighted distance measure is presented in this article. Comparative experimental results of real world data from UCI demonstrate the validity of the proposed algorithm.
Citations
More filters
Book ChapterDOI
01 Jan 2017
TL;DR: The review starts with RST in the context of data preprocessing as well as the generation of both descriptive and predictive knowledge via decision rule induction, association rule mining and clustering.
Abstract: This chapter emphasizes on the role played by rough set theory (RST) within the broad field of Machine Learning (ML). As a sound data analysis and knowledge discovery paradigm, RST has much to offer to the ML community. We surveyed the existing literature and reported on the most relevant RST theoretical developments and applications in this area. The review starts with RST in the context of data preprocessing (discretization, feature selection, instance selection and meta-learning) as well as the generation of both descriptive and predictive knowledge via decision rule induction, association rule mining and clustering. Afterward, we examined several special ML scenarios in which RST has been recently introduced, such as imbalanced classification, multi-label classification, dynamic/incremental learning, Big Data analysis and cost-sensitive learning.

31 citations

Journal ArticleDOI
TL;DR: The sparse subspace clustering (SSC) algorithm is introduced to analyze the time series data and has a better performance both on the artificial data set and the daily box-office data than recently developed well-known clustering algorithm such as K-means and spectral clustering algorithm.
Abstract: Movie box-office research is an important work for the rapid development of the film industry, and it is also a challenging task Our study focuses on finding the regular box-office revenue patterns Clustering algorithm is unsupervised machine learning algorithm which classifies the data in the absence of early knowledge of the classes Unlike static data, the time series data vary with time The work focused on time series clustering analysis is relatively less than those focused on static data In this paper, the sparse subspace clustering (SSC) algorithm is introduced to analyze the time series data The SSC algorithm has a better performance both on the artificial data set and the daily box-office data than recently developed well-known clustering algorithm such as K-means and spectral clustering algorithm On the artificial data set, SSC is more suitable for time series, whether from the angle of clustering error or visualization On the actual data, movies are divided into five clusters by SSC algorithm, and each cluster represents a distinct type of distribution pattern And these patterns can be used in movie recommendation, film evaluation and can guide theater exhibitors and distributors In addition, this is the first time to apply SSC to deal with time series clustering problem and get a pleasant effect

12 citations

References
More filters
Book
08 Sep 2000
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

23,600 citations


"An Improved Algorithm of Rough K-Me..." refers methods in this paper

  • ...Currently, the clustering algorithm have been applied in a number of areas including data mining, statistics, machine learning, spatial database technology [1]....

    [...]

Book
01 Jan 2008
TL;DR: In this paper, generalized estimating equations (GEE) with computing using PROC GENMOD in SAS and multilevel analysis of clustered binary data using generalized linear mixed-effects models with PROC LOGISTIC are discussed.
Abstract: tic regression, and it concerns studying the effect of covariates on the risk of disease. The chapter includes generalized estimating equations (GEE’s) with computing using PROC GENMOD in SAS and multilevel analysis of clustered binary data using generalized linear mixed-effects models with PROC LOGISTIC. As a prelude to the following chapter on repeated-measures data, Chapter 5 presents time series analysis. The material on repeated-measures analysis uses linear additive models with GEE’s and PROC MIXED in SAS for linear mixed-effects models. Chapter 7 is about survival data analysis. All computing throughout the book is done using SAS procedures.

9,995 citations

Journal ArticleDOI
U.M. Feyyad1
TL;DR: Without a concerted effort to develop knowledge discovery techniques, organizations stand to forfeit much of the value from the data they currently collect and store.
Abstract: Current computing and storage technology is rapidly outstripping society's ability to make meaningful use of the torrent of available data. Without a concerted effort to develop knowledge discovery techniques, organizations stand to forfeit much of the value from the data they currently collect and store.

4,806 citations

01 Jan 1999
TL;DR: The topics in LNAI include automated reasoning, automated programming, algorithms, knowledge representation, agent-based systems, intelligent systems, expert systems, machine learning, natural-language processing, machine vision, robotics, search systems, knowledge discovery, data mining, and related programming languages.
Abstract: LNAI was established in the mid-1980s as a topical subseries of LNCS focusing on artificial intelligence. This subseries is devoted to the publication of state-of-the-art research results in artificial intelligence, at a high level and in both printed and electronic versions making use of the well-established LNCS publication machinery. As with the LNCS mother series, proceedings and postproceedings are at the core of LNAI; however, all other sublines are available for LNAI as well. The topics in LNAI include automated reasoning, automated programming, algorithms, knowledge representation, agent-based systems, intelligent systems, expert systems, machine learning, natural-language processing, machine vision, robotics, search systems, knowledge discovery, data mining, and related programming languages.

3,464 citations