scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

An effective hybridized classifier for breast cancer diagnosis

07 Jul 2015-pp 1026-1031
TL;DR: The paper proposes an effective hybridized classifier for breast cancer diagnosis made by combining an unsupervised artificial neural network method named self organizing maps (SOM) with a supervised classifier called stochastic gradient descent (SGD).
Abstract: After lung cancer, breast cancer is known to be the greatest cause for death among females [20] The improving effectiveness of machine learning approaches is being given a lot of importance by medical practitioners for breast cancer diagnosis The paper proposes an effective hybridized classifier for breast cancer diagnosis The classifier is made by combining an unsupervised artificial neural network (ANN) method named self organizing maps (SOM) with a supervised classifier called stochastic gradient descent (SGD) Also a comparative analysis is performed between the proposed approach and three supervised state of the art machine learning techniques decision tree (DTs), random forests (RF) and support vector machine (SVM) Initially SGD method is used in isolation for the classification task and then it is made to perform the classification after being hybridized with the unsupervised ANN technique on Wisconsin Breast Cancer Database (WBCD) [10] The comparison is based up on classification accuracy that is produced by generating a confusion matrix For verifying consistency of accuracy values, the classification task was repeated with Internet Advertisements Dataset [11] The results of the classification experimentation using hybridization of SOM with SGD are much more superior to SGD in isolation All the accuracy values have been computed after achieving a ten-fold cross validation on the both the datasets to further verify the classifier's performance
Citations
More filters
Journal ArticleDOI
TL;DR: This study found that of the six medical tasks that exist, the diagnosis medical task was that most frequently researched, and that the experiment-based empirical type and evaluation-based research type were the most dominant approaches adopted in the selected studies.

128 citations

Book ChapterDOI
17 Jan 2017
TL;DR: The experimental results show that the accuracy of intrusion detection using Deep Neural Network is satisfactory and the potential capability of Deep Neural network as a classifier for the different types of intrusion attacks is checked.
Abstract: Security of data is considered to be one of the most important concerns in today’s world. Data is vulnerable to various types of intrusion attacks that may reduce the utility of any network or systems. Constantly changing and the complicated nature of intrusion activities on computer networks cannot be dealt with IDSs that are currently operational. Identifying and preventing such attacks is one of the most challenging tasks. Deep Learning is one of the most effective machine learning techniques which is getting popular recently. This paper checks the potential capability of Deep Neural Network as a classifier for the different types of intrusion attacks. A comparative study has also been carried out with Support Vector Machine (SVM). The experimental results show that the accuracy of intrusion detection using Deep Neural Network is satisfactory.

102 citations

Journal ArticleDOI
TL;DR: An ensemble-based intrusion detection model that combines logistic regression, naive Bayes, and decision tree have been deployed with voting classifier after analyzing model’s performance with some prominent existing state-of-the-art techniques and results illustrate significant improvement in terms of accuracy as compared to existing models.
Abstract: The domain of Internet of Things (IoT) has witnessed immense adaptability over the last few years by drastically transforming human lives to automate their ordinary daily tasks. This is achieved by interconnecting heterogeneous physical devices with different functionalities. Consequently, the rate of cyber threats has also been raised with the expansion of IoT networks which puts data integrity and stability on stake. In order to secure data from misuse and unusual attempts, several intrusion detection systems (IDSs) have been proposed to detect the malicious activities on the basis of predefined attack patterns. The rapid increase in such kind of attacks requires improvements in the existing IDS. Machine learning has become the key solution to improve intrusion detection systems. In this study, an ensemble-based intrusion detection model has been proposed. In the proposed model, logistic regression, naive Bayes, and decision tree have been deployed with voting classifier after analyzing model’s performance with some prominent existing state-of-the-art techniques. Moreover, the effectiveness of the proposed model has been analyzed using CICIDS2017 dataset. The results illustrate significant improvement in terms of accuracy as compared to existing models in terms of both binary and multi-class classification scenarios.

43 citations

Book ChapterDOI
24 Aug 2016
TL;DR: Evaluation of the prediction models indicates that the Multivariate Adaptive Regression Splines model describes the dataset better and has achieved significantly better prediction accuracy as compared to the Random Forest and Classification and Regression Tree.
Abstract: Air pollution is one of the major environmental worries in recent time. Abrupt increase in the concentration of any gas leads to air pollution. The cities are mostly affected due to the abundance of population there. One of the worst gaseous pollutants is OZONE (O3). In this paper, we propose three predictive models for estimation of concentration of ozone gases in the air which are Random Forest, Multivariate Adaptive Regression Splines and Classification and Regression Tree. Evaluation of the prediction models indicates that the Multivariate Adaptive Regression Splines model describes the dataset better and has achieved significantly better prediction accuracy as compared to the Random Forest and Classification and Regression Tree. A detailed comparative study has been carried out on the performances of Random Forest, Multivariate Adaptive Regression Splines and Classification and Regression Tree. MARS gives the result by considering less variables as compared to other two. Moreover, Random Forest takes a little more time for building the tree as the elapsed time was calculated to 45 s in this case. In addition, variable importance for each model has been predicted. Observing all the graphs Multivariate Adaptive Regression Splines gives the closest curve of both train and test set when compared. It can be concluded that multivariate adaptive regression splines can be a valuable tool in predicting ozone for future.

12 citations

Book ChapterDOI
24 Aug 2016
TL;DR: Three classification models Naive Bayes, MultiClass Classifier, K-Star and IBK are adopted as potential classifiers for prediction of customer satisfaction at San Francisco International Airport to find the least amount of deviation from the actual values.
Abstract: Customer satisfaction is an important term in business as well as marketing as it surely indicates how well the customer expectations have been met with by the product or the service. Thus a good prediction model for customer satisfaction can help any organization make better decisions with respect to its services and work in a more informed matter to improvise on the same. The problem considered in this study is optimization of customer satisfaction for the customers of San Francisco International Airport. This paper adopts three classification models Naive Bayes, MultiClass Classifier, K-Star and IBK as potential classifiers for prediction of customer satisfaction. The customer satisfaction depends on various factors. The factors which we consider are the user ratings for artwork and exhibitions, restaurants, variety stores, concessions, signage, directions inside SFO, information booths near baggage claim and departure, Wi-Fi, parking facilities, walkways, air train and an overall rating for the airport services. The ratings are obtained from a detailed customer survey conducted by the mentioned airport in 2015. The original survey focused on questions including airlines, destination airport, delays of flights, conveyance to and from the airport, security/immigration etc. but our study focuses on the previously mentioned questions. Graphs are plotted for actual and predicted values and compared to find the least amount of deviation from the actual values. The model which shows least deviation from actual values is considered optimal for the above mentioned problem.

12 citations

References
More filters
Journal Article
TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.

47,974 citations

Posted Content
TL;DR: Scikit-learn as mentioned in this paper is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems.
Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from this http URL.

28,898 citations

Book
01 Apr 2003

3,950 citations

Book ChapterDOI
Léon Bottou1
01 Jan 2012
TL;DR: This chapter provides background material, explains why SGD is a good learning algorithm when the training set is large, and provides useful recommendations.
Abstract: Chapter 1 strongly advocates the stochastic back-propagation method to train neural networks. This is in fact an instance of a more general technique called stochastic gradient descent (SGD). This chapter provides background material, explains why SGD is a good learning algorithm when the training set is large, and provides useful recommendations.

1,666 citations


"An effective hybridized classifier ..." refers background in this paper

  • ...Donald and Robert [5] conducted a classification on the data set of dense canopy pine plantation....

    [...]

Journal ArticleDOI
TL;DR: The most notable characteristic of the descriptive epidemiology of breast cancer in recent years is perhaps the rapidly increasing incidence rates in developing countries.
Abstract: Breast cancer is the most common cancer among women in the United States. Knowledge of the descriptive epidemiology of breast cancer is useful both in suggesting etiologic hypotheses and, if preventive measures can be identified, in delineating high-risk groups to be targeted for preventive efforts. Demographic risk factors include increasing age (in Western countries), being white for breast cancer diagnosed at age 45 years or more, being black for breast cancer diagnosed at less than 40 years of age, high socioeconomic status, having never married, being of the Jewish faith, urban residence, and residence in the northern (as compared with the southern) United States. Incidence rates are generally highest in North American and Northern European countries, intermediate in Southern and Eastern European and South American countries, and lowest in Asia and Africa. The most notable characteristic of the descriptive epidemiology of breast cancer in recent years is perhaps the rapidly increasing incidence rates in developing countries. Identification of specific reasons for these increasing rates would contribute substantially to our understanding of the epidemiology of breast cancer.

414 citations


"An effective hybridized classifier ..." refers methods in this paper

  • ...The comparison is based up on classification accuracy that is produced by generating a confusion matrix....

    [...]