scispace - formally typeset
Search or ask a question
Author

S. Gnanapriya

Bio: S. Gnanapriya is an academic researcher. The author has contributed to research in topics: Data pre-processing & Data warehouse. The author has an hindex of 1, co-authored 1 publications receiving 4646 citations.

Papers
More filters
Journal Article
TL;DR: Data mining is the search for new, valuable, and nontrivial information in large volumes of data, a cooperative effort of humans and computers that is possible to put data-mining activities into one of two categories: Predictive data mining, which produces the model of the system described by the given data set, or Descriptive data mining which produces new, nontrivials information based on the available data set.
Abstract: Understand the need for analyses of large, complex, information-rich data sets. Identify the goals and primary tasks of the data-mining process. Describe the roots of data-mining technology. Recognize the iterative character of a data-mining process and specify its basic steps. Explain the influence of data quality on a data-mining process. Establish the relation between data warehousing and data mining. Data mining is an iterative process within which progress is defined by discovery, through either automatic or manual methods. Data mining is most useful in an exploratory analysis scenario in which there are no predetermined notions about what will constitute an "interesting" outcome. Data mining is the search for new, valuable, and nontrivial information in large volumes of data. It is a cooperative effort of humans and computers. Best results are achieved by balancing the knowledge of human experts in describing problems and goals with the search capabilities of computers. In practice, the two primary goals of data mining tend to be prediction and description. Prediction involves using some variables or fields in the data set to predict unknown or future values of other variables of interest. Description, on the other hand, focuses on finding patterns describing the data that can be interpreted by humans. Therefore, it is possible to put data-mining activities into one of two categories: Predictive data mining, which produces the model of the system described by the given data set, or Descriptive data mining, which produces new, nontrivial information based on the available data set.

4,646 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The experimental results show that the proposed black hole algorithm outperforms other traditional heuristic algorithms for several benchmark datasets.

963 citations

Journal ArticleDOI
01 Feb 2011
TL;DR: The main data mining techniques used for FFD are logistic models, neural networks, the Bayesian belief network, and decision trees, all of which provide primary solutions to the problems inherent in the detection and classification of fraudulent data.
Abstract: This paper presents a review of - and classification scheme for - the literature on the application of data mining techniques for the detection of financial fraud. Although financial fraud detection (FFD) is an emerging topic of great importance, a comprehensive literature review of the subject has yet to be carried out. This paper thus represents the first systematic, identifiable and comprehensive academic literature review of the data mining techniques that have been applied to FFD. 49 journal articles on the subject published between 1997 and 2008 was analyzed and classified into four categories of financial fraud (bank fraud, insurance fraud, securities and commodities fraud, and other related financial fraud) and six classes of data mining techniques (classification, regression, clustering, prediction, outlier detection, and visualization). The findings of this review clearly show that data mining techniques have been applied most extensively to the detection of insurance fraud, although corporate fraud and credit card fraud have also attracted a great deal of attention in recent years. In contrast, we find a distinct lack of research on mortgage fraud, money laundering, and securities and commodities fraud. The main data mining techniques used for FFD are logistic models, neural networks, the Bayesian belief network, and decision trees, all of which provide primary solutions to the problems inherent in the detection and classification of fraudulent data. This paper also addresses the gaps between FFD and the needs of the industry to encourage additional research on neglected topics, and concludes with several suggestions for further FFD research.

917 citations

Journal ArticleDOI
TL;DR: A systematic literature review on articles published from 2008 to 2012 on the application of DM techniques for supplier selection is provided by using a methodological decision analysis in four aspects including decision problems, decision makers, decision environments, and decision approaches.
Abstract: Despite the importance of decision-making (DM) techniques for construction of effective decision models for supplier selection, there is a lack of a systematic literature review for it. This paper provides a systematic literature review on articles published from 2008 to 2012 on the application of DM techniques for supplier selection. By using a methodological decision analysis in four aspects including decision problems, decision makers, decision environments, and decision approaches, we finally selected and reviewed 123 journal articles. To examine the research trend on uncertain supplier selection, these articles are roughly classified into seven categories according to different uncertainties. Under such classification framework, 26 DM techniques are identified from three perspectives: (1) Multicriteria decision making (MCDM) techniques, (2) Mathematical programming (MP) techniques, and (3) Artificial intelligence (AI) techniques. We reviewed each of the 26 techniques and analyzed the means of integrating these techniques for supplier selection. Our survey provides the recommendation for future research and facilitates knowledge accumulation and creation concerning the application of DM techniques in supplier selection.

825 citations

Journal ArticleDOI
TL;DR: Experimental results demonstrate that the classification accuracy rates of the developed approach surpass those of grid search and many other approaches, and that the developed PSO+SVM approach has a similar result to GA+S VM, Therefore, the PSO + SVM approach is valuable for parameter determination and feature selection in an SVM.
Abstract: Support vector machine (SVM) is a popular pattern classification method with many diverse applications. Kernel parameter setting in the SVM training procedure, along with the feature selection, significantly influences the classification accuracy. This study simultaneously determines the parameter values while discovering a subset of features, without reducing SVM classification accuracy. A particle swarm optimization (PSO) based approach for parameter determination and feature selection of the SVM, termed PSO+SVM, is developed. Several public datasets are employed to calculate the classification accuracy rate in order to evaluate the developed PSO+SVM approach. The developed approach was compared with grid search, which is a conventional method of searching parameter values, and other approaches. Experimental results demonstrate that the classification accuracy rates of the developed approach surpass those of grid search and many other approaches, and that the developed PSO+SVM approach has a similar result to GA+SVM. Therefore, the PSO+SVM approach is valuable for parameter determination and feature selection in an SVM.

802 citations

Journal ArticleDOI
TL;DR: The independence assumptions in cross validation are introduced, and the circumstances that satisfy the assumptions are also addressed, which are used to derive the sampling distributions of the point estimators for k-fold and leave-one-out cross validation.

798 citations