scispace - formally typeset
Search or ask a question

Showing papers by "Hossam Faris published in 2017"


Journal ArticleDOI
TL;DR: The qualitative and quantitative results prove the efficiency of SSA and MSSA and demonstrate the merits of the algorithms proposed in solving real-world problems with difficult and unknown search spaces.

3,027 citations


Journal ArticleDOI
TL;DR: The comprehensive results and various comparisons reveal that the EPD has a remarkable impact on the efficacy of the GOA and using the selection mechanism enhanced the capability of the proposed approach to outperform other optimizers and find the best solutions with improved convergence trends.
Abstract: Searching for the optimal subset of features is known as a challenging problem in feature selection process. To deal with the difficulties involved in this problem, a robust and reliable optimization algorithm is required. In this paper, Grasshopper Optimization Algorithm (GOA) is employed as a search strategy to design a wrapper-based feature selection method. The GOA is a recent population-based metaheuristic that mimics the swarming behaviors of grasshoppers. In this work, an efficient optimizer based on the simultaneous use of the GOA, selection operators, and Evolutionary Population Dynamics (EPD) is proposed in the form of four different strategies to mitigate the immature convergence and stagnation drawbacks of the conventional GOA. In the first two approaches, one of the top three agents and a randomly generated one are selected to reposition a solution from the worst half of the population. In the third and fourth approaches, to give a chance to the low fitness solutions in reforming the population, Roulette Wheel Selection (RWS) and Tournament Selection (TS) are utilized to select the guiding agent from the first half. The proposed GOA_EPD approaches are employed to tackle various feature selection tasks. The proposed approaches are benchmarked on 22 UCI datasets. The comprehensive results and various comparisons reveal that the EPD has a remarkable impact on the efficacy of the GOA and using the selection mechanism enhanced the capability of the proposed approach to outperform other optimizers and find the best solutions with improved convergence trends. Furthermore, the comparative experiments demonstrate the superiority of the proposed approaches when compared to other similar methods in the literature.

341 citations


Book ChapterDOI
01 Jan 2017
Abstract: This book chapter proposes a new training algorithms for Radial Basis Function (RBF) using a recently proposed optimization algorithm called Moth–Flame Optimizer (MFO). After formulating MFO as RBFN trainer, seven standard binary classifications are employed as case studies. The MFO-based trainer is compared with Particle Swarm Algorithm (PSO), Genetic Algorithm (GA), Bat Algorithm (BA), and newrb. The results show that the proposed trainer is able to show superior results on the majority of case studies. The observation of convergence behavior proves that this new trainer benefits from accelerating convergence speed as well.

42 citations


Book ChapterDOI
26 Apr 2017
TL;DR: The software defect prediction problem is formulated as a classification task, and then it examines the impact of several ensembles methods on the classification effectiveness, and the proposed hybrid method can effectively enhance the defect prediction accuracy.
Abstract: Software defect prediction is the process of identifying new defects/bugs in software modules. Software defect presents an error in a computer program, which is caused by incorrect code or incorrect programming logic. As a result, undiscovered defects lead to a poor quality software products. In recent years, software defect prediction has received a considerable amount of attention from researchers. Most of the previous defect detection algorithms are marred by low defect detection ratios. Furthermore, software defect prediction is very challenging problem due to the high imbalanced distribution, where the bug-free codes are much higher than defective ones. In this paper, the software defect prediction problem is formulated as a classification task, and then it examines the impact of several ensembles methods on the classification effectiveness. In addition, the best ensemble classifier will be selected to be trained again on an over-sampled datasets using the Synthetic Minority Over-sampling Technique (SMOTE) algorithm to tackle imbalanced distribution problem. The proposed hybrid method is evaluated using four software defects datasets. Experimental results demonstrate that the proposed method can effectively enhance the defect prediction accuracy.

34 citations


Journal ArticleDOI
TL;DR: The obtained models are able to predict sales from pre-publication data with remarkable accuracy, and can be used as decision-aid tools for publishers, which can provide a reliable guidance on the decision process of publishing a book.
Abstract: When a new book is launched the publisher faces the problem of how many books should be printed for delivery to bookstores; printing too many is the main issue, since it implies a loss of investment due to inventory excess, but printing too few will also have a negative economic impact. In this paper, we are tackling the problem of predicting total sales in order to print the right amount of books and doing so even before the book has reached the stores. A real dataset including the complete sales data for books published in Spain across several years has been used. We have conducted an analysis in three stages: an initial exploratory analysis, by means of data visualisation techniques; a feature selection process, using different techniques to find out what are the variables that have more impact on sales; and a regression or prediction stage, in which a set of machine learning methods has been applied to create forecasting models for book sales. The obtained models are able to predict sales from pre-publication data with remarkable accuracy, and can be visualised as simple decision trees. Thus, these can be used as decision-aid tools for publishers, which can provide a reliable guidance on the decision process of publishing a book. This is also shown in the paper by addressing four example cases of representative publishers, regarding their number of sales and the number of different books they sell.

32 citations


Journal ArticleDOI
01 Nov 2017
TL;DR: Bidirectional echo state reservoir networks trained using support vector machine privileged information method (SVM$$+$$+) to model a winding machine process and developed results show that Bi-ESNs trained with SVM are promising and provide better generalization performance compared to other models.
Abstract: In the last decade, a wide range of machine learning approaches were proposed and experimented to model highly nonlinear manufacturing processes. However, improving the performance of such models is challenging due to the complexity and high dimensionality of the manufacturing processes in general. In this paper, we propose bidirectional echo state reservoir networks (Bi-ESNs) trained using support vector machine privileged information method (SVM $$+$$ ) to model a winding machine process. The proposed model will be applied, tested and compared to reported models in the literature such as classical ESN with linear regression, ESN with a linear SVM readout, genetic programming, feedfoward neural network with backpropagation, radial basis function network, adaptive neural fuzzy inference system and local linear wavelet neural network. The developed results show that Bi-ESNs trained with SVM $$+$$ are promising. It was able to provide better generalization performance compared to other models.

28 citations


Proceedings ArticleDOI
01 Oct 2017
TL;DR: The most influencing spam features reported in the literature are discussed and the development and implementation of an open source tool that provides a flexible way to extract a large number of features from any email corpus to produce cleansed dataset which can be used to train and test various classification algorithms.
Abstract: Recently, a wide range of Machine Learning (ML) algorithms have been proposed for building email spam detection models. However, the performance of ML methods highly depends on the extracted features. In this paper, we discuss the most influencing spam features reported in the literature. We also describe the development and implementation of an open source tool that provides a flexible way to extract a large number of features from any email corpus to produce cleansed dataset which can be used to train and test various classification algorithms. A total of 140 features are extracted from SpamAssassin email corpus using the developed tool. Extracted features are used to evaluate four popular ML classifiers and a better results are achieved in comparison with the results of a similar previous study.

24 citations


Proceedings ArticleDOI
01 Oct 2017
TL;DR: This paper proposes Android botnet detection method based a new set of discriminating features extracted based from the analysis of Android permissions, which have a tiny improvement on the performance in the case of decision trees and Random forest classifiers.
Abstract: Android is one of the most popular and widespread operating systems for smartphones. It has several millions of applications that are published at either official or unofficial stores. Botnet applications are kind of malware that can be published using these stores and downloaded by the victims on their smartphones. In this paper, we propose Android botnet detection method based a new set of discriminating features extracted based from the analysis of Android permissions (i.e. Protection levels for all available Android permissions). Then we compared the prediction power of different machine learning models before and after adding these features to the state-of-art requested permissions features in Android. We used four popular ML classifiers (i.e. Random Forest, MultiLayer Perceptron neural networks, Decision trees, and Naive Bayes) for our experiments and we found that the new set of features have a tiny improvement on the performance in the case of decision trees and Random forest classifiers.

16 citations


Book ChapterDOI
26 Apr 2017
TL;DR: This work empirically evaluates the performance of CRJ for time series forecasting problems, and compares it to ESN and Auto-Regressive with eXogenous inputs (NARX) models.
Abstract: The cycle reservoir with regular jumps (CRJ) is a recent deterministic reservoir model with a very simple structure and highly constrained weight values. CRJ was proposed as an alternative to the randomized Echo State Network (ESN) reservoir. In this work, we empirically evaluate the performance of CRJ for time series forecasting problems, and compare it to ESN and Auto-Regressive with eXogenous inputs (NARX) models. The comparison is conducted based on seven time series datasets that represent different real world cases. Simulation results show that CRJ outperforms ESN and NARX models. The results also demonstrate the effectiveness of CRJ when applied for different time series forecasting problems

7 citations


Posted ContentDOI
TL;DR: The purpose of the tool is to help practitioners and researchers to build datasets that can be used for training machine learning models for spam detection.
Abstract: EMFET is an open source and flexible tool that can be used to extract a large number of features from any email corpus with emails saved in EML format. The extracted features can be categorized into three main groups: header features, payload (body) features, and attachment features. The purpose of the tool is to help practitioners and researchers to build datasets that can be used for training machine learning models for spam detection. So far, 140 features can be extracted using EMFET. EMFET is extensible and easy to use. The source code of EMFET is publicly available at GitHub (this https URL)

2 citations