Integrating Global and Local Application of Naive Bayes Classifier

Home
/
Papers
/
Integrating Global and Local Application of Naive Bayes Classifier

Journal Article•

Integrating Global and Local Application of Naive Bayes Classifier

01 Jan 2014-The International Arab Journal of Information Technology-Vol. 11, pp 300-307

TL;DR: A large-scale comparison with other attempts that have tried to improve the accuracy of the Naive Bayes algorithm as well as other state-of-the-art algorithms on 28 standard benchmark datasets shows that the proposed method gave better accuracy in most cases.

read less

Abstract: Naive Bayes algorithm captures the assumption that every attribute is independent from the rest of the attributes, given the state of the class attribute. In this study, we attempted to increase the prediction accuracy of the simple Bayes model by integrating global and local application of Naive Bayes classifier. We performed a large-scale comparison with other attempts that have tried to improve the accuracy of the Naive Bayes algorithm as well as other state-of-the-art algorithms on 28 standard benchmark datasets and the proposed method gave better accuracy in most cases.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•

Data Mining Practical Machine Learning Tools and Techniques

[...]

อนิรุธ สืบสิงห์

01 Jan 2014-Journal of management science

9,185 citations

Book Chapter•DOI•

Detection of Pathological Brain in MRI Scanning Based on Wavelet-Entropy and Naive Bayes Classifier

[...]

Xingxing Zhou¹, Shuihua Wang², Wei Xu, Genlin Ji¹, Preetha Phillips³, Ping Sun⁴, Yudong Zhang¹ - Show less +3 more•Institutions (4)

Nanjing Normal University¹, Nanjing University², Shepherd University³, City College of New York⁴

15 Apr 2015

TL;DR: A computer assisted diagnosis method based on a wavelet-entropy approach and a Naive Bayes classifier classification method for improving the brain diagnosis accuracy by means of NMR images is presented.

...read moreread less

Abstract: An accurate diagnosis is important for the medical treatment of patients suffered from brain disease Nuclear magnetic resonance images are commonly used by technicians to assist the pre-clinical diagnosis, rating them by visual evaluations The classification of NMR images of normal and pathological brains poses a challenge from technological point of view, since NMR imaging generates a large information set that reflects the conditions of the brain In this work, we present a computer assisted diagnosis method based on a wavelet-entropy (In this paper 2D-discrete wavelet transform has been used, in that it can extract more information) of the feature space approach and a Naive Bayes classifier classification method for improving the brain diagnosis accuracy by means of NMR images The most relevant image feature is selected as the wavelet entropy, which is used to train a Naive Bayes classifier The results over 64 images show that the sensitivity of the classifier is as high as 9450%, the specificity 9170%, the overall accuracy 9260% It is easily observed from the data that the proposed classifier can detect abnormal brains from normal controls within excellent performance, which is competitive with latest existing methods

...read moreread less

102 citations

Cites result from "Integrating Global and Local Applic..."

...NBC is widely recognized as a simple and effective probabilistic classification method [12], and its performance is comparable with or higher than those of the decision tree [13] and neural network [14]....
[...]

Journal Article•DOI•

Entropy-Based Application Layer DDoS Attack Detection Using Artificial Neural Networks

[...]

Khundrakpam Johnson Singh, Khelchandra Thongam, Tanmay De

01 Oct 2016-Entropy

TL;DR: This work analyzed the standard EPA-HTTP dataset and selected the parameters that will be used as input to the classifier model for differentiating the attack from normal profile, and the proposed model can provide a better accuracy, sensitivity, and specificity than other traditional classification models.

...read moreread less

Abstract: Distributed denial-of-service (DDoS) attack is one of the major threats to the web server. The rapid increase of DDoS attacks on the Internet has clearly pointed out the limitations in current intrusion detection systems or intrusion prevention systems (IDS/IPS), mostly caused by application-layer DDoS attacks. Within this context, the objective of the paper is to detect a DDoS attack using a multilayer perceptron (MLP) classification algorithm with genetic algorithm (GA) as learning algorithm. In this work, we analyzed the standard EPA-HTTP (environmental protection agency-hypertext transfer protocol) dataset and selected the parameters that will be used as input to the classifier model for differentiating the attack from normal profile. The parameters selected are the HTTP GET request count, entropy, and variance for every connection. The proposed model can provide a better accuracy of 98.31%, sensitivity of 0.9962, and specificity of 0.0561 when compared to other traditional classification models.

...read moreread less

60 citations

Cites methods from "Integrating Global and Local Applic..."

...After the successful classification into either attack or normal class, we compared the classification strength of MLP-GA with that of other state-of-the-art classification models, such as MLP [24], radial basis function (RBF) network [25], naive Bayes [26], and random forest [27] in a population size of 357....
[...]
...Kotsiantis, S. Integrating Global and Local Application of Naive Bayes Classifier....
[...]
...Comparison of receiver operating characteristic (ROC) curve of MLP-genetic algorithm (GA) with (a) radial basis function (RBF) network, (b) naive Bayes, (c) random forest, and (d) multilayer perceptron....
[...]

Journal Article•DOI•

MLP-GA based algorithm to detect application layer DDoS attack

[...]

Khundrakpam Johnson Singh¹, Tanmay De¹•Institutions (1)

National Institute of Technology, Durgapur¹

01 Oct 2017

TL;DR: The experimental results show that MLP-GA provides the best efficiency of 98.04% for detecting the layer seven DDoS attacks, and the proposed method provides a minimum value of False Positive when compared with traditional classifiers such as Naive Bayes, Radial Basis Function, MLP, J48, and C45, etc.

...read moreread less

Abstract: Distributed Denial of Service (DDoS) attack is transforming into a weapon by the attackers, politicians, and cyber terrorists, etc. Today there is a quick ascent in the exploration field of mitigation and guard against DDoS attacks, however in actuality; the capabilities of the hackers are additionally growing. From early news of focusing on the network and transport layer, now a day’s application layer becomes the point of convergence of the attacks. In the paper, we first analyze the features from incoming packets. These features include Hyper Text Transfer Protocol (HTTP) count, the number of the Internet Protocol (IP) address during a time window, the constant mapping of the port number and frame of the packets. In the paper, we write all the combinations of these metrics and then analyzed the client’s behaviors from the public attack and normal data sets. We use Environmental Protection Agency-Hypertext Transfer Protocol (EPA-HTTP) DDoS, Center for Applied Internet Data Analysis (CAIDA) 2007 and experimentally produced DDoS data set using Slowloris attack to draw the efficiency and effectiveness of the features for layer seven DDoS detection. Second, we employ Multilayer Perceptron with a Genetic Algorithm (MLP-GA) to estimate the efficiency of the detection using the metrics. The experimental results show that MLP-GA provides the best efficiency of 98.04% for detecting the layer seven DDoS attacks. The proposed method provides a minimum value of False Positive when compared with traditional classifiers such as Naive Bayes, Radial Basis Function (RBF) Network, MLP, J48, and C45, etc.

...read moreread less

40 citations

Journal Article•DOI•

Efficient Classification of DDoS Attacks Using an Ensemble Feature Selection Algorithm

[...]

Khundrakpam Johnson Singh¹, Tanmay De¹•Institutions (1)

National Institute of Technology, Durgapur¹

01 Dec 2017-Journal of intelligent systems

TL;DR: This article proposes an ensemble feature selection algorithm to determine which attribute in the given training datasets is efficient in categorizing the classes and uses a multilayer perceptron classifier as the final classifier, as it provides better accuracy when compared to other conventional classification models.

...read moreread less

Abstract: Abstract In the current cyber world, one of the most severe cyber threats are distributed denial of service (DDoS) attacks, which make websites and other online resources unavailable to legitimate clients. It is different from other cyber threats that breach security parameters; however, DDoS is a short-term attack that brings down the server temporarily. Appropriate selection of features plays a crucial role for effective detection of DDoS attacks. Too many irrelevant features not only produce unrelated class categories but also increase computation overhead. In this article, we propose an ensemble feature selection algorithm to determine which attribute in the given training datasets is efficient in categorizing the classes. The result of the ensemble algorithm when compared to a threshold value will enable us to decide the features. The selected features are deployed as training inputs for various classifiers to select a classifier that yields maximum accuracy. We use a multilayer perceptron classifier as the final classifier, as it provides better accuracy when compared to other conventional classification models. The proposed method classifies the new datasets into either attack or normal classes with an efficiency of 98.3% and also reduces the overall computation time. We use the CAIDA 2007 dataset to evaluate the performance of the proposed method using MATLAB and Weka 3.6 simulators.

...read moreread less

19 citations

Cites methods from "Integrating Global and Local Applic..."

...We introduce the application of various classifiers, such as mutilayer perceptron (MLP) with the backpropagation method [25], naive Bayes [11], random forest [1], and radial basis function (RBF) network [5], to classify the dataset into attack and normal classes....
[...]
...In this article, we deploy MLP, naive Bayes, RBF network, and random forest, which are machine learning classifiers for training the common features along with the target....
[...]
...– We introduce the application of various classifiers, such as mutilayer perceptron (MLP) with the backpropagation method [25], naive Bayes [11], random forest [1], and radial basis function (RBF) network [5], to classify the dataset into attack and normal classes....
[...]
...We also run Weka 3.6 with the incremental naive Bayes classifier [10] to compare the accuracy of the two models with their respective probability of RMSE generated as shown in Figure 10A and B....
[...]

References

PDF

Open Access

More filters

Book•

C4.5: Programs for Machine Learning

[...]

J. Ross Quinlan¹•Institutions (1)

University of Sydney¹

15 Oct 1992

TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.

...read moreread less

Abstract: From the Publisher: Classifier systems play a major role in machine learning and knowledge-based systems, and Ross Quinlan's work on ID3 and C4.5 is widely acknowledged to have made some of the most significant contributions to their development. This book is a complete guide to the C4.5 system as implemented in C for the UNIX environment. It contains a comprehensive guide to the system's use , the source code (about 8,800 lines), and implementation notes. The source code and sample datasets are also available on a 3.5-inch floppy diskette for a Sun workstation. C4.5 starts with large sets of cases belonging to known classes. The cases, described by any mixture of nominal and numeric properties, are scrutinized for patterns that allow the classes to be reliably discriminated. These patterns are then expressed as models, in the form of decision trees or sets of if-then rules, that can be used to classify new cases, with emphasis on making the models understandable as well as accurate. The system has been applied successfully to tasks involving tens of thousands of cases described by hundreds of properties. The book starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting. Advantages and disadvantages of the C4.5 approach are discussed and illustrated with several case studies. This book and software should be of interest to developers of classification-based intelligent systems and to students in machine learning and expert systems courses.

...read moreread less

21,674 citations

Book•

Data Mining: Practical Machine Learning Tools and Techniques

[...]

Ian H. Witten, Eibe Frank, Mark Hall

25 Oct 1999

TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.

...read moreread less

Abstract: Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research. *Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projects *Offers concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods *Includes downloadable Weka software toolkit, a collection of machine learning algorithms for data mining tasks-in an updated, interactive interface. Algorithms in toolkit cover: data pre-processing, classification, regression, clustering, association rules, visualization

...read moreread less

20,196 citations

Journal Article•DOI•

The WEKA data mining software: an update

[...]

Mark Hall, Eibe Frank¹, Geoffrey Holmes¹, Bernhard Pfahringer¹, Peter Reutemann¹, Ian H. Witten¹ - Show less +2 more•Institutions (1)

University of Waikato¹

16 Nov 2009-Sigkdd Explorations

TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.

...read moreread less

Abstract: More than twelve years have elapsed since the first public release of WEKA. In that time, the software has been rewritten entirely from scratch, evolved substantially and now accompanies a text on data mining [35]. These days, WEKA enjoys widespread acceptance in both academia and business, has an active community, and has been downloaded more than 1.4 million times since being placed on Source-Forge in April 2000. This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.

...read moreread less

19,603 citations

"Integrating Global and Local Applic..." refers methods in this paper

...Eight well-known algorithms were used for the comparison: discretize simple Bayes [17], NB with kernel estimation [17], locally weighted naive Bayes [12], lazy bayesian rule-learning algorithm [40], discretize NB [17], averaged onedependence estimator [37], weightily averaged onedependence estimator [19], hidden naive Bayes algorithm [20]....
[...]

UCI Machine Learning Repository

[...]

A. Asuncion

01 Jan 2007

17,341 citations

Book•

Data Mining

[...]

Ian Witten

01 Jan 2008

TL;DR: In this paper, generalized estimating equations (GEE) with computing using PROC GENMOD in SAS and multilevel analysis of clustered binary data using generalized linear mixed-effects models with PROC LOGISTIC are discussed.

...read moreread less

Abstract: tic regression, and it concerns studying the effect of covariates on the risk of disease. The chapter includes generalized estimating equations (GEE’s) with computing using PROC GENMOD in SAS and multilevel analysis of clustered binary data using generalized linear mixed-effects models with PROC LOGISTIC. As a prelude to the following chapter on repeated-measures data, Chapter 5 presents time series analysis. The material on repeated-measures analysis uses linear additive models with GEE’s and PROC MIXED in SAS for linear mixed-effects models. Chapter 7 is about survival data analysis. All computing throughout the book is done using SAS procedures.

...read moreread less

9,995 citations