Proceedings ArticleDOI
Selecting the right interestingness measure for association patterns
Pang-Ning Tan,Vipin Kumar,Jaideep Srivastava +2 more
- pp 32-41
TLDR
An overview of various measures proposed in the statistics, machine learning and data mining literature is presented and it is shown that each measure has different properties which make them useful for some application domains, but not for others.Abstract:
Many techniques for association rule mining and feature selection require a suitable metric to capture the dependencies among variables in a data set. For example, metrics such as support, confidence, lift, correlation, and collective strength are often used to determine the interestingness of association patterns. However, many such measures provide conflicting information about the interestingness of a pattern, and the best metric to use for a given application domain is rarely known. In this paper, we present an overview of various measures proposed in the statistics, machine learning and data mining literature. We describe several key properties one should examine in order to select the right measure for a given application domain. A comparative study of these properties is made using twenty one of the existing measures. We show that each measure has different properties which make them useful for some application domains, but not for others. We also present two scenarios in which most of the existing measures agree with each other, namely, support-based pruning and table standardization. Finally, we present an algorithm to select a small set of tables such that an expert can select a desirable measure by looking at just this small set of tables.read more
Citations
More filters
Data Mining: Concepts and Techniques (2nd edition)
Jiawei Han,Micheline Kamber +1 more
TL;DR: There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99].
Journal ArticleDOI
The Google Similarity Distance
TL;DR: A new theory of similarity between words and phrases based on information distance and Kolmogorov complexity is presented, which is applied to construct a method to automatically extract similarity, the Google similarity distance, of Words and phrases from the WWW using Google page counts.
Journal ArticleDOI
Frequent pattern mining: current status and future directions
TL;DR: It is believed that frequent pattern mining research has substantially broadened the scope of data analysis and will have deep impact on data mining methodologies and applications in the long run, however, there are still some challenging research issues that need to be solved before frequent patternmining can claim a cornerstone approach in data mining applications.
Journal ArticleDOI
Interestingness measures for data mining: A survey
Liqiang Geng,Howard J. Hamilton +1 more
TL;DR: This survey reviews the interestingness measures for rules and summaries, classifies them from several perspectives, compares their properties, identifies their roles in the data mining process, gives strategies for selecting appropriate measures for applications, and identifies opportunities for future research in this area.
References
More filters
Proceedings ArticleDOI
Mining association rules between sets of items in large databases
TL;DR: An efficient algorithm is presented that generates all significant association rules between items in the database of customer transactions and incorporates buffer management and novel estimation and pruning techniques.
Journal ArticleDOI
Categorical Data Analysis
TL;DR: In this article, categorical data analysis was used for categorical classification of categorical categorical datasets.Categorical Data Analysis, categorical Data analysis, CDA, CPDA, CDSA
Journal ArticleDOI
Data mining and knowledge discovery: making sense out of data
TL;DR: Without a concerted effort to develop knowledge discovery techniques, organizations stand to forfeit much of the value from the data they currently collect and store.
Posted Content
Principles of data mining
TL;DR: This paper gives a lightning overview of data mining and its relation to statistics, with particular emphasis on tools for the detection of adverse drug reactions.