scispace - formally typeset
Search or ask a question
Topic

Lift (data mining)

About: Lift (data mining) is a research topic. Over the lifetime, 3699 publications have been published within this topic receiving 27562 citations. The topic is also known as: lift curve.


Papers
More filters
Journal ArticleDOI
26 May 2016-Nature

2,609 citations

Proceedings ArticleDOI
23 Jul 2002
TL;DR: An overview of various measures proposed in the statistics, machine learning and data mining literature is presented and it is shown that each measure has different properties which make them useful for some application domains, but not for others.
Abstract: Many techniques for association rule mining and feature selection require a suitable metric to capture the dependencies among variables in a data set. For example, metrics such as support, confidence, lift, correlation, and collective strength are often used to determine the interestingness of association patterns. However, many such measures provide conflicting information about the interestingness of a pattern, and the best metric to use for a given application domain is rarely known. In this paper, we present an overview of various measures proposed in the statistics, machine learning and data mining literature. We describe several key properties one should examine in order to select the right measure for a given application domain. A comparative study of these properties is made using twenty one of the existing measures. We show that each measure has different properties which make them useful for some application domains, but not for others. We also present two scenarios in which most of the existing measures agree with each other, namely, support-based pruning and table standardization. Finally, we present an algorithm to select a small set of tables such that an expert can select a desirable measure by looking at just this small set of tables.

985 citations

Journal ArticleDOI
TL;DR: It is found that there is no need to under-sample so that there are as many churners in your training set as non churners, and under-sampling can lead to improved prediction accuracy, especially when evaluated with AUC.
Abstract: Customer churn is often a rare event in service industries, but of great interest and great value. Until recently, however, class imbalance has not received much attention in the context of data mining [Weiss, G. M. (2004). Mining with rarity: A unifying framework. SIGKDD Explorations, 6(1), 7-19]. In this study, we investigate how we can better handle class imbalance in churn prediction. Using more appropriate evaluation metrics (AUC, lift), we investigated the increase in performance of sampling (both random and advanced under-sampling) and two specific modelling techniques (gradient boosting and weighted random forests) compared to some standard modelling techniques. AUC and lift prove to be good evaluation metrics. AUC does not depend on a threshold, and is therefore a better overall evaluation metric compared to accuracy. Lift is very much related to accuracy, but has the advantage of being well used in marketing practice [Ling, C., & Li, C. (1998). Data mining for direct marketing problems and solutions. In Proceedings of the fourth international conference on knowledge discovery and data mining (KDD-98). New York, NY: AAAI Press]. Results show that under-sampling can lead to improved prediction accuracy, especially when evaluated with AUC. Unlike Ling and Li [Ling, C., & Li, C. (1998). Data mining for direct marketing problems and solutions. In Proceedings of the fourth international conference on knowledge discovery and data mining (KDD-98). New York, NY: AAAI Press], we find that there is no need to under-sample so that there are as many churners in your training set as non churners. Results show no increase in predictive performance when using the advanced sampling technique CUBE in this study. This is in line with findings of Japkowicz [Japkowicz, N. (2000). The class imbalance problem: significance and strategies. In Proceedings of the 2000 international conference on artificial intelligence (IC-AI'2000): Special track on inductive learning, Las Vegas, Nevada], who noted that using sophisticated sampling techniques did not give any clear advantage. Weighted random forests, as a cost-sensitive learner, performs significantly better compared to random forests, and is therefore advised. It should, however always be compared to logistic regression. Boosting is a very robust classifier, but never outperforms any other technique.

462 citations

Proceedings ArticleDOI
09 Dec 2006
TL;DR: This paper proposes a low overhead, software-only information flow tracking system, called LIFT, which minimizes run-time overhead by exploiting dynamic binary instrumentation and optimizations/or detecting various types of security attacks without requiring any hardware changes.
Abstract: Computer security is severely threatened by software vulnerabilities. Prior work shows that information flow tracking (also referred to as taint analysis) is a promising technique to detect a wide range of security attacks. However, current information flow tracking systems are not very practical, because they either require program annotations, source code, non-trivial hardware extensions, or incur prohibitive runtime overheads. This paper proposes a low overhead, software-only information flow tracking system, called LIFT, which minimizes run-time overhead by exploiting dynamic binary instrumentation and optimizations for detecting various types of security attacks without requiring any hardware changes. More specifically, LIFT aggressively eliminates unnecessary dynamic information flow tracking, coalesces information checks, and efficiently switches between target programs and instrumented information flow tracking code. We have implemented LIFT on a dynamic binary instrumentation framework on Windows. Our real-system experiments with two real-world server applications, one client application and eighteen attack benchmarks show that LIFT can effectively detect various types of security attacks. LIFT also incurs very low overhead, only 6.2% for server applications, and 3.6 times on average for seven SPEC INT2000 applications. Our dynamic optimizations are very effective in reducing the overhead by a factor of 5-12 times.

435 citations

Journal ArticleDOI
TL;DR: Lift is proposed, an intuitive yet effective algorithm that constructs features specific to each label by conducting clustering analysis on its positive and negative instances, and then performs training and testing by querying the clustering results.
Abstract: Multi-label learning deals with the problem where each example is represented by a single instance (feature vector) while associated with a set of class labels. Existing approaches learn from multi-label data by manipulating with identical feature set, i.e. the very instance representation of each example is employed in the discrimination processes of all class labels. However, this popular strategy might be suboptimal as each label is supposed to possess specific characteristics of its own. In this paper, another strategy to learn from multi-label data is studied, where label-specific features are exploited to benefit the discrimination of different class labels. Accordingly, an intuitive yet effective algorithm named Lift , i.e. multi-label learning with Label specIfic FeaTures , is proposed. Lift firstly constructs features specific to each label by conducting clustering analysis on its positive and negative instances, and then performs training and testing by querying the clustering results. Comprehensive experiments on a total of 17 benchmark data sets clearly validate the superiority of Lift against other well-established multi-label learning algorithms as well as the effectiveness oflabel-specific features.

371 citations


Network Information
Related Topics (5)
Reynolds number
68.4K papers, 1.6M citations
70% related
Turbine
106.6K papers, 1M citations
68% related
Internal combustion engine
130.5K papers, 1M citations
66% related
Turbulence
112.1K papers, 2.7M citations
65% related
Laminar flow
56K papers, 1.2M citations
65% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20231,110
20222,242
2021205
2020185
2019176
2018223