scispace - formally typeset
Search or ask a question
Journal Article

Integrating Global and Local Application of Naive Bayes Classifier

TL;DR: A large-scale comparison with other attempts that have tried to improve the accuracy of the Naive Bayes algorithm as well as other state-of-the-art algorithms on 28 standard benchmark datasets shows that the proposed method gave better accuracy in most cases.
Abstract: Naive Bayes algorithm captures the assumption that every attribute is independent from the rest of the attributes, given the state of the class attribute. In this study, we attempted to increase the prediction accuracy of the simple Bayes model by integrating global and local application of Naive Bayes classifier. We performed a large-scale comparison with other attempts that have tried to improve the accuracy of the Naive Bayes algorithm as well as other state-of-the-art algorithms on 28 standard benchmark datasets and the proposed method gave better accuracy in most cases.
Citations
More filters
Book ChapterDOI
15 Apr 2015
TL;DR: A computer assisted diagnosis method based on a wavelet-entropy approach and a Naive Bayes classifier classification method for improving the brain diagnosis accuracy by means of NMR images is presented.
Abstract: An accurate diagnosis is important for the medical treatment of patients suffered from brain disease Nuclear magnetic resonance images are commonly used by technicians to assist the pre-clinical diagnosis, rating them by visual evaluations The classification of NMR images of normal and pathological brains poses a challenge from technological point of view, since NMR imaging generates a large information set that reflects the conditions of the brain In this work, we present a computer assisted diagnosis method based on a wavelet-entropy (In this paper 2D-discrete wavelet transform has been used, in that it can extract more information) of the feature space approach and a Naive Bayes classifier classification method for improving the brain diagnosis accuracy by means of NMR images The most relevant image feature is selected as the wavelet entropy, which is used to train a Naive Bayes classifier The results over 64 images show that the sensitivity of the classifier is as high as 9450%, the specificity 9170%, the overall accuracy 9260% It is easily observed from the data that the proposed classifier can detect abnormal brains from normal controls within excellent performance, which is competitive with latest existing methods

102 citations


Cites result from "Integrating Global and Local Applic..."

  • ...NBC is widely recognized as a simple and effective probabilistic classification method [12], and its performance is comparable with or higher than those of the decision tree [13] and neural network [14]....

    [...]

Journal ArticleDOI
01 Oct 2016-Entropy
TL;DR: This work analyzed the standard EPA-HTTP dataset and selected the parameters that will be used as input to the classifier model for differentiating the attack from normal profile, and the proposed model can provide a better accuracy, sensitivity, and specificity than other traditional classification models.
Abstract: Distributed denial-of-service (DDoS) attack is one of the major threats to the web server. The rapid increase of DDoS attacks on the Internet has clearly pointed out the limitations in current intrusion detection systems or intrusion prevention systems (IDS/IPS), mostly caused by application-layer DDoS attacks. Within this context, the objective of the paper is to detect a DDoS attack using a multilayer perceptron (MLP) classification algorithm with genetic algorithm (GA) as learning algorithm. In this work, we analyzed the standard EPA-HTTP (environmental protection agency-hypertext transfer protocol) dataset and selected the parameters that will be used as input to the classifier model for differentiating the attack from normal profile. The parameters selected are the HTTP GET request count, entropy, and variance for every connection. The proposed model can provide a better accuracy of 98.31%, sensitivity of 0.9962, and specificity of 0.0561 when compared to other traditional classification models.

60 citations


Cites methods from "Integrating Global and Local Applic..."

  • ...After the successful classification into either attack or normal class, we compared the classification strength of MLP-GA with that of other state-of-the-art classification models, such as MLP [24], radial basis function (RBF) network [25], naive Bayes [26], and random forest [27] in a population size of 357....

    [...]

  • ...Kotsiantis, S. Integrating Global and Local Application of Naive Bayes Classifier....

    [...]

  • ...Comparison of receiver operating characteristic (ROC) curve of MLP-genetic algorithm (GA) with (a) radial basis function (RBF) network, (b) naive Bayes, (c) random forest, and (d) multilayer perceptron....

    [...]

Journal ArticleDOI
01 Oct 2017
TL;DR: The experimental results show that MLP-GA provides the best efficiency of 98.04% for detecting the layer seven DDoS attacks, and the proposed method provides a minimum value of False Positive when compared with traditional classifiers such as Naive Bayes, Radial Basis Function, MLP, J48, and C45, etc.
Abstract: Distributed Denial of Service (DDoS) attack is transforming into a weapon by the attackers, politicians, and cyber terrorists, etc. Today there is a quick ascent in the exploration field of mitigation and guard against DDoS attacks, however in actuality; the capabilities of the hackers are additionally growing. From early news of focusing on the network and transport layer, now a day’s application layer becomes the point of convergence of the attacks. In the paper, we first analyze the features from incoming packets. These features include Hyper Text Transfer Protocol (HTTP) count, the number of the Internet Protocol (IP) address during a time window, the constant mapping of the port number and frame of the packets. In the paper, we write all the combinations of these metrics and then analyzed the client’s behaviors from the public attack and normal data sets. We use Environmental Protection Agency-Hypertext Transfer Protocol (EPA-HTTP) DDoS, Center for Applied Internet Data Analysis (CAIDA) 2007 and experimentally produced DDoS data set using Slowloris attack to draw the efficiency and effectiveness of the features for layer seven DDoS detection. Second, we employ Multilayer Perceptron with a Genetic Algorithm (MLP-GA) to estimate the efficiency of the detection using the metrics. The experimental results show that MLP-GA provides the best efficiency of 98.04% for detecting the layer seven DDoS attacks. The proposed method provides a minimum value of False Positive when compared with traditional classifiers such as Naive Bayes, Radial Basis Function (RBF) Network, MLP, J48, and C45, etc.

40 citations

Journal ArticleDOI
TL;DR: This article proposes an ensemble feature selection algorithm to determine which attribute in the given training datasets is efficient in categorizing the classes and uses a multilayer perceptron classifier as the final classifier, as it provides better accuracy when compared to other conventional classification models.
Abstract: Abstract In the current cyber world, one of the most severe cyber threats are distributed denial of service (DDoS) attacks, which make websites and other online resources unavailable to legitimate clients. It is different from other cyber threats that breach security parameters; however, DDoS is a short-term attack that brings down the server temporarily. Appropriate selection of features plays a crucial role for effective detection of DDoS attacks. Too many irrelevant features not only produce unrelated class categories but also increase computation overhead. In this article, we propose an ensemble feature selection algorithm to determine which attribute in the given training datasets is efficient in categorizing the classes. The result of the ensemble algorithm when compared to a threshold value will enable us to decide the features. The selected features are deployed as training inputs for various classifiers to select a classifier that yields maximum accuracy. We use a multilayer perceptron classifier as the final classifier, as it provides better accuracy when compared to other conventional classification models. The proposed method classifies the new datasets into either attack or normal classes with an efficiency of 98.3% and also reduces the overall computation time. We use the CAIDA 2007 dataset to evaluate the performance of the proposed method using MATLAB and Weka 3.6 simulators.

19 citations


Cites methods from "Integrating Global and Local Applic..."

  • ...We introduce the application of various classifiers, such as mutilayer perceptron (MLP) with the backpropagation method [25], naive Bayes [11], random forest [1], and radial basis function (RBF) network [5], to classify the dataset into attack and normal classes....

    [...]

  • ...In this article, we deploy MLP, naive Bayes, RBF network, and random forest, which are machine learning classifiers for training the common features along with the target....

    [...]

  • ...– We introduce the application of various classifiers, such as mutilayer perceptron (MLP) with the backpropagation method [25], naive Bayes [11], random forest [1], and radial basis function (RBF) network [5], to classify the dataset into attack and normal classes....

    [...]

  • ...We also run Weka 3.6 with the incremental naive Bayes classifier [10] to compare the accuracy of the two models with their respective probability of RMSE generated as shown in Figure 10A and B....

    [...]

References
More filters
Book
15 Oct 1992
TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
Abstract: From the Publisher: Classifier systems play a major role in machine learning and knowledge-based systems, and Ross Quinlan's work on ID3 and C4.5 is widely acknowledged to have made some of the most significant contributions to their development. This book is a complete guide to the C4.5 system as implemented in C for the UNIX environment. It contains a comprehensive guide to the system's use , the source code (about 8,800 lines), and implementation notes. The source code and sample datasets are also available on a 3.5-inch floppy diskette for a Sun workstation. C4.5 starts with large sets of cases belonging to known classes. The cases, described by any mixture of nominal and numeric properties, are scrutinized for patterns that allow the classes to be reliably discriminated. These patterns are then expressed as models, in the form of decision trees or sets of if-then rules, that can be used to classify new cases, with emphasis on making the models understandable as well as accurate. The system has been applied successfully to tasks involving tens of thousands of cases described by hundreds of properties. The book starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting. Advantages and disadvantages of the C4.5 approach are discussed and illustrated with several case studies. This book and software should be of interest to developers of classification-based intelligent systems and to students in machine learning and expert systems courses.

21,674 citations

Book
25 Oct 1999
TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
Abstract: Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research. *Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projects *Offers concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods *Includes downloadable Weka software toolkit, a collection of machine learning algorithms for data mining tasks-in an updated, interactive interface. Algorithms in toolkit cover: data pre-processing, classification, regression, clustering, association rules, visualization

20,196 citations

Journal ArticleDOI
TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
Abstract: More than twelve years have elapsed since the first public release of WEKA. In that time, the software has been rewritten entirely from scratch, evolved substantially and now accompanies a text on data mining [35]. These days, WEKA enjoys widespread acceptance in both academia and business, has an active community, and has been downloaded more than 1.4 million times since being placed on Source-Forge in April 2000. This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.

19,603 citations


"Integrating Global and Local Applic..." refers methods in this paper

  • ...Eight well-known algorithms were used for the comparison: discretize simple Bayes [17], NB with kernel estimation [17], locally weighted naive Bayes [12], lazy bayesian rule-learning algorithm [40], discretize NB [17], averaged onedependence estimator [37], weightily averaged onedependence estimator [19], hidden naive Bayes algorithm [20]....

    [...]

01 Jan 2007

17,341 citations

Book
01 Jan 2008
TL;DR: In this paper, generalized estimating equations (GEE) with computing using PROC GENMOD in SAS and multilevel analysis of clustered binary data using generalized linear mixed-effects models with PROC LOGISTIC are discussed.
Abstract: tic regression, and it concerns studying the effect of covariates on the risk of disease. The chapter includes generalized estimating equations (GEE’s) with computing using PROC GENMOD in SAS and multilevel analysis of clustered binary data using generalized linear mixed-effects models with PROC LOGISTIC. As a prelude to the following chapter on repeated-measures data, Chapter 5 presents time series analysis. The material on repeated-measures analysis uses linear additive models with GEE’s and PROC MIXED in SAS for linear mixed-effects models. Chapter 7 is about survival data analysis. All computing throughout the book is done using SAS procedures.

9,995 citations