scispace - formally typeset
Search or ask a question

Showing papers by "Hossam Faris published in 2021"


Journal ArticleDOI
TL;DR: This article aims to detect cyber hate speech based on Arabic context over Twitter platform, by applying Natural Language Processing (NLP) techniques, and machine learning methods.
Abstract: Nowadays, cyber hate speech is increasingly growing, which forms a serious problem worldwide by threatening the cohesion of civil societies. Hate speech relates to using expressions or phrases that...

55 citations


Journal ArticleDOI
TL;DR: An approach of three stages considering a clustering with reduction stage, an oversampling stage, and a classification by a Single Hidden Layer Feed-Forward Neural Network (SLFN) stage for intrusion detection of IoT-based data shows better results than other values and other classification techniques.
Abstract: Intrusion detection of IoT-based data is a hot topic and has received a lot of interests from researchers and practitioners since the security of IoT networks is crucial. Both supervised and unsupervised learning methods are used for intrusion detection of IoT networks. This paper proposes an approach of three stages considering a clustering with reduction stage, an oversampling stage, and a classification by a Single Hidden Layer Feed-Forward Neural Network (SLFN) stage. The novelty of the paper resides in the technique of data reduction and data oversampling for generating useful and balanced training data and the hybrid consideration of the unsupervised and supervised methods for detecting the intrusion activities. The experiments were evaluated in terms of accuracy, precision, recall, and G-mean and divided into four steps: measuring the effect of the data reduction with clustering, the evaluation of the framework with basic classifiers, the effect of the oversampling technique, and a comparison with basic classifiers. The results show that SLFN classification technique and the choice of Support Vector Machine and Synthetic Minority Oversampling Technique (SVM-SMOTE) with a ratio of 0.9 and the k value of 3 for k-means++ clustering technique give better results than other values and other classification techniques.

42 citations


Journal ArticleDOI
TL;DR: In this article, a new methodology for the detection of Ransomware that is depending on an evolutionary-based machine learning approach is introduced, where the binary particle swarm optimization algorithm is utilized for tuning the hyperparameters of the classification algorithm, as well as performing feature selection.
Abstract: In recent years, Ransomware has been a critical threat that attacks smartphones. Ransomware is a kind of malware that blocks the mobile’s system and prevents the user of the infected device from accessing their data until a ransom is paid. Worldwide, Ransomware attacks have led to serious losses for individuals and stakeholders. However, the dramatic increase of Ransomware families makes to the process of identifying them more challenging due to their continuously evolved characteristics. Traditional malware detection methods (e.g., statistical-based prevention methods) fail to combat the evolving Ransomware since they result in a high percentage of false positives. Indeed, developing a non-classical, intelligent technique to safeguarding against Ransomware is of significant importance. This paper introduces a new methodology for the detection of Ransomware that is depending on an evolutionary-based machine learning approach. The binary particle swarm optimization algorithm is utilized for tuning the hyperparameters of the classification algorithm, as well as performing feature selection. The support vector machines (SVM) algorithm is used alongside the synthetic minority oversampling technique (SMOTE) for classification. The utilized dataset is collected from various sources, which consists of 10,153 Android applications, where 500 of them are Ransomware. The performance of the proposed approach SMOTE- $t$ BPSO-SVM achieved merits over traditional machine learning algorithms by having the highest scores in terms of sensitivity, specificity, and g-mean.

37 citations


Journal ArticleDOI
24 Apr 2021-Sensors
TL;DR: In this paper, a deep multi-layer classification approach for intrusion detection is proposed combining two stages of detection of the existence of an intrusion and the type of intrusion, along with an oversampling technique to ensure better quality of the classification results.
Abstract: The security of IoT networks is an important concern to researchers and business owners, which is taken into careful consideration due to its direct impact on the availability of the services offered by IoT devices and the privacy of the users connected with the network. An intrusion detection system ensures the security of the network and detects malicious activities attacking the network. In this study, a deep multi-layer classification approach for intrusion detection is proposed combining two stages of detection of the existence of an intrusion and the type of intrusion, along with an oversampling technique to ensure better quality of the classification results. Extensive experiments are made for different settings of the first stage and the second stage in addition to two different strategies for the oversampling technique. The experiments show that the best settings of the proposed approach include oversampling by the intrusion type identification label (ITI), 150 neurons for the Single-hidden Layer Feed-forward Neural Network (SLFN), and 2 layers and 150 neurons for LSTM. The results are compared to well-known classification techniques, which shows that the proposed technique outperforms the others in terms of the G-mean having the value of 78% compared to 75% for KNN and less than 50% for the other techniques.

33 citations


Journal ArticleDOI
TL;DR: The nature and the characteristics of spam profiles in a social network like Twitter to improve spam detection, based on a number of publicly available language-independent features, are addressed, leading to a better understanding of social spam and improving detection methods by considering the various important features resulting from the different lingual contexts.
Abstract: In online social networks, spam profiles represent one of the most serious security threats over the Internet; if they do not stop producing bad advertisements, they can be exploited by criminals f...

27 citations


Journal ArticleDOI
TL;DR: The proposed XGBoost-GA model revealed an optimistic and superior predictability performance with a maximum coefficient of determination (R 2 = 0.933) and a minimum root mean square error (RMSE) and demonstrated reliable feature selection for the essential physical parameters.

21 citations


Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed oriented stochastic loss descent (OSLD), which updates a random-initialized parameter iteratively in the opposite direction of its partial derivative sign by a small positive random number, scaled by a tuned ratio of the model loss.
Abstract: Deep multi-layer neural networks represent hypotheses of very high degree polynomials to solve very complex problems. Gradient descent optimization algorithms are utilized to train such deep networks through backpropagation, which suffers from permanent problems such as the vanishing gradient problem. To overcome the vanishing problem, we introduce a new anti-vanishing back-propagated learning algorithm called oriented stochastic loss descent (OSLD). OSLD updates a random-initialized parameter iteratively in the opposite direction of its partial derivative sign by a small positive random number, which is scaled by a tuned ratio of the model loss. This paper compares OSLD to stochastic gradient descent algorithm as the basic backpropagation algorithm and Adam as one of the best backpropagation algorithms in five benchmark models. Experimental results show that OSLD is very competitive to Adam in small and moderate depth models, and OSLD outperforms Adam in very long models. Moreover, OSLD is compatible with current backpropagation architectures except for learning rates. Finally, OSLD is stable and opens more choices in front of the very deep multi-layer neural networks.

20 citations


Journal ArticleDOI
TL;DR: In this paper, a credibility domain-based KG Embedding framework is proposed to capture a fusion of data related to politics domain and obtained from heterogeneous resources into a formal KG representation depicted by a politics domain ontology.
Abstract: Knowledge Graphs (KGs) have gained considerable attention recently from both academia and industry. In fact, incorporating graph technology and the copious of various graph datasets have led the research community to build sophisticated graph analytics tools, which has extended the application of KGs to tackle a plethora of real-life problems in dissimilar domains. Despite the abundance of the currently proliferated generic KGs, there is a vital need to construct domain-specific KGs. Further, quality and credibility should be assimilated in the process of constructing and augmenting KGs, particularly those propagated from mixed-quality resources such as social media data. For example, the amount of the political discourses in social media is overwhelming yet can be hijacked and misused by spammers to spread misinformation and false news. This paper presents a novel credibility domain-based KG Embedding framework. This framework involves capturing a fusion of data related to politics domain and obtained from heterogeneous resources into a formal KG representation depicted by a politics domain ontology. The proposed approach makes use of various knowledge-based repositories to enrich the semantics of the textual contents, thereby facilitating the interoperability of information. The proposed framework also embodies a domain-based social credibility module to ensure data quality and trustworthiness. The utility of the proposed framework is verified by means of experiments conducted on two constructed KGs. The KGs are then embedded in low-dimensional semantically-continuous space using several embedding techniques. The effectiveness of embedding techniques and social credibility module is further demonstrated and substantiated on link prediction, clustering, and visualisation tasks.

20 citations


Journal ArticleDOI
TL;DR: A parallel heterogeneous ensemble feature selection based on three well-regarded algorithms: genetic algorithm, particle swarm optimizer, and grey wolf optimizer is proposed, which shows that the proposed parallel approach improved the performance in terms of the prediction results and running time.
Abstract: Ensemble learning have emerged as a useful machine learning technique, which is based on the idea that combining the output of multiple models instead of using a single model. This practice, known as “diversity”, and it usually enhances the performance. On other hand, ensemble feature selection method is based on the same idea, where multiple feature subsets are combined to select an optimal subset of features. Learning methods have difficulties with the dimensionality curse that impact the performance and increase the time exponentially. To overcome this issue, we propose a parallel heterogeneous ensemble feature selection based on three well-regarded algorithms: genetic algorithm, particle swarm optimizer, and grey wolf optimizer. The proposed approach is based on four phases; namely, distribution phase, parallel ensemble feature selection phase, combining and aggregation phase, and testing phase. Three implementations of the proposed approach are presented: a sequential approach running on the central processing unit (CPU), a parallel approach running on multi-core CPU, and a parallel approach running on multi-core CPU with graphics processing units (GPU). To assess the performance of the proposed approach twenty-one large datasets were used. The results show that the proposed parallel approach improved the performance in terms of the prediction results and running time.

18 citations


Journal ArticleDOI
TL;DR: The goal of the proposed algorithm is to improve the quality of clustering results by finding a solution that maximizes the separation between different clusters and maximized the cohesion between data points in the same cluster.
Abstract: Evolutionary algorithms have shown their powerful capabilities in different machine learning problems including clustering which is a growing area of research nowadays. In this paper, we propose an efficient clustering technique based on the evolution behavior of genetic algorithm and an advanced variant of nearest neighbor search technique based on assignment and election mechanisms. The goal of the proposed algorithm is to improve the quality of clustering results by finding a solution that maximizes the separation between different clusters and maximizes the cohesion between data points in the same cluster. Our proposed algorithm which we refer to as “EvoNP” is tested with 15 well-known data sets using 5 well-known external evaluation measures and is compared with 7 well-regarded clustering algorithms . The experiments are conducted in two phases: evaluation of the best fitness function for the algorithm and evaluation of the algorithm against other clustering algorithms. The results show that the proposed algorithm works well with silhouette coefficient fitness function and outperforms the other algorithms for the majority of the data sets. The source code of EvoNP is available at http://evo-ml.com/evonp/ .

17 citations


Journal ArticleDOI
01 May 2021
TL;DR: This work investigates Android botnets using static analysis to extract possible features from the applications source code after being reverse engineered and proposes a new set of features related to accessing resources on the target mobile.
Abstract: Today, Android stands out amongst the most well-known and far reaching smartphones’ operating systems. It has millions of applications that are distributed at either accredited or informal stores. Botnet applications are classified as malwares that can be distributed by utilizing these stores and downloaded by the unfortunate users on their smartphones. This work investigates Android botnets using static analysis to extract possible features from the applications source code after being reverse engineered. The features are then used to develop effective machine learning models to detect such malicious applications. Additionally, the study proposes a new set of features related to accessing resources on the target mobile. The features are extracted from 1928 Android botnet applications (ISCX dataset) and 2224 of Android benign applications (downloaded and scanned by special tools developed as part of this work). The extracted features are categorized into six groups of features in addition to a group that contains all the extracted features. Each group of features undergoes training and testing processes using four popular ML classifiers (i.e. Random Forest, Multi-Layer Perceptron neural networks, Decision trees, and Naive Bayes). After comparing the results and performing features importance analysis, it can be noted that the URL set of features play the key role in the Android botnet detection problem and the Random Forest classifier obtains the best results based on all sets of features.

Journal ArticleDOI
TL;DR: In this article, a deep learning-based language generation model was proposed to simplify the process of writing medical recommendations for doctors in an Arabic context, to improve service satisfaction and patient-doctor interactions.
Abstract: We are currently witnessing an immense proliferation of natural language processing (NLP) applications. Natural language generation (NLG) has emerged from NLP and is now commonly utilized in various applications, including chatting applications. The objective of this paper is to propose a deep learning-based language generation model that simplifies the process of writing medical recommendations for doctors in an Arabic context, to improve service satisfaction and patient-doctor interactions. The developed language generation model is a predictive text system intended for next word prediction in a telemedicine service. Altibbi—a digital platform for telemedicine and teleconsultations services in the Middle East and the North Africa (MENA) region—was utilized as a case study for the textual prediction process. The proposed model was trained using data obtained from Altibbi databases related to medical recommendations, particularly gynecology, dermatology, psychiatric diseases, urology, and internist diseases. Variants of deep learning models were implemented and optimized for next word prediction, based on the unidirectional and bidirectional long short-term memory (LSTM and BiLSTM), the one-dimensional convolutional neural network (CONV1D), and a combination of LSTM and CONV1D (LSTM-CONV1D). The algorithms were trained using two versions of the datasets (i.e., 3-gram and 4-gram representations) and evaluated in terms of their training accuracy and loss, validation accuracy and loss, and testing accuracy per their matching scores. The proposed models’ performances were comparable. CONV1D produced the most promising matching score.

Journal ArticleDOI
01 May 2021
TL;DR: This paper is an extension to the existing EvoCluster framework in which it includes different distance measures for the objective function, different techniques of detecting the k value, and a user option to consider either supervised or unsupervised datasets.
Abstract: EvoCluster is an open source and cross-platform framework implemented in Python language, which includes the most well-known and recent nature-inspired metaheuristic optimizers that are customized to perform partitional clustering tasks. This paper is an extension to the existing EvoCluster framework in which it includes different distance measures for the objective function, different techniques of detecting the k value, and a user option to consider either supervised or unsupervised datasets. The current implementation of the framework includes ten metaheuristic optimizers, thirty datasets, five objective functions, twelve evaluation measures, more than twenty distance measures, and ten different ways for detecting the k value. The source code of EvoCluster is publicly available at http://evo-ml.com/evocluster/ .

Journal ArticleDOI
TL;DR: An intelligent diagnosis decision support system as part of a telemedicine 1 platform for serving the Middle East and North Africa (MENA) region is proposed that encompasses a fusion of machine learning models trained based on two modalities: the symptoms and the medical questions of the patients.

Journal ArticleDOI
02 Jan 2021
TL;DR: In this paper, an improved evolutionary variant of competitive swarm optimizer (CSO) is proposed to evolve the parameters of SVMs and optimize the weights of features to enrich the efficacy of the SVMs based on simultaneous optimization of the parameters and feature weighting.
Abstract: To deal with classification problems, support vector machines (SVMs) are utilized in a wide variety of applications as effective and powerful supervised learning paradigm. However, the efficacy and outcomes of an SVM-based classification model is influenced by the proper selection of SVM parameters in addition to the nature of the datasets. Therefore, the purpose of this work is to enrich the efficacy of the SVMs based on simultaneous optimization of the parameters and feature weighting of these models. In this paper, an improved evolutionary variant of competitive swarm optimizer (CSO) is proposed to evolve the parameters of SVMs and optimize the weights of features. Simulations and experiments are performed based on various datasets from UCI repository to investigate the effectiveness of the proposed hybrid CSO-based SVM model versus genetic algorithm, particle swarm optimizer and the classical grid-based search. Results and analysis reveal that the proposed crossover-based mechanism inside CSO has improved the classification capabilities of the hybrid CSO-SVM technique.

Journal ArticleDOI
TL;DR: In this paper, a new feature selection approach is proposed based on a modified Teaching-Learning-based Optimization (TLBO) combined with four new binarization methods: the Elitist, the elitist Roulette, the ELitist Tournament, and the Rank-based method.
Abstract: Machine learning techniques heavily rely on available training data in a data set. Certain features in the data can interfere with the learning process, so it is required to remove irrelevant and redundant features to build a robust training model. As such, several feature selection techniques are usually applied in a pre-processing phase to obtain the most appropriate set of features and improve the overall learning process. In this paper, a new feature selection approach is proposed based on a modified Teaching-Learning-based Optimization (TLBO) combined with four new binarization methods: the Elitist, the Elitist Roulette, the Elitist Tournament, and the Rank-based method. The influence of these binarization methods is studied and compared to other state-of-the-art techniques. The experimental results such as Shapiro-Wilk normality and Wilcoxon ranksum test show that both transfer functions and binarization approaches have a significant influence on the effectiveness of the binary TLBO. The experiments show that choosing a fitting transfer function along with a suitable binarization method has a substantial impact on the exploratory and exploitative potentials of the feature selection technique.


Journal ArticleDOI
TL;DR: In this paper, an Arabic neural-based word embedding model is proposed for clinical decision support systems in the medical and healthcare context, which is based on word clustering and the similarity of words.
Abstract: In recent years, the utilization of natural language processing (NLP) and Machine Learning (ML) techniques in clinical decision support systems have shown their ability in improving and automating the diagnosis process, and reducing potential clinical errors. NLP in the Arabic language is more intricate due to several limitations, such as the lack of datasets and analytical resources compared to other languages like English. However, a clinical decision support system in the Arabic context is of significant importance. A fundamental process in NLP is extracting features from text-based data via text embedding. Word embedding is a representation of words in a numeric format that encodes the statistic, semantic, or context information. Building a neural word embedding model requires hundreds of thousands of data instances to find hidden patterns of relationships within sentences. Essentially, extracting relevant and informative features promotes the performance of the learning algorithms. The objective of this paper is to propose an Arabic neural-based word embedding model in the medical and healthcare context (called “AltibbiVec”). Around 1.5 million medical consultations and questions written in different dialects are obtained from Altibbi telemedicine company and used to train the embedding model. Three different embedding models are developed and compared, which are Word2Vec, fastText, and GloVe. The trained models were evaluated by different criteria, including the word clustering and the similarity of words. Besides, performing a specialty-based question classification. The results show that Word2Vec and fastText capture sufficiently the semantics of text more than GloVe. Hence, they are recommended for healthcare NLP-based applications.

Journal ArticleDOI
TL;DR: In this article, a binary multi-objective variant of MVO (MOMVO) is proposed to deal with feature selection tasks, which can effectively eliminate irrelevant and/or redundant features and maintain a minimum classification error rate when dealing with different datasets.
Abstract: Classification tasks often include, among the large number of features to be processed in the datasets, many irrelevant and redundant ones, which can even decrease the efficiency of classifiers. Feature Selection (FS) is the most common preprocessing technique utilized to overcome the drawbacks of the high dimensionality of datasets and often has two conflicting objectives: The first function aims to maximize the classification performance or reduce the error rate of the classifier. In contrast, the second function is designed to minimize the number of features. However, the majority of wrapper FS techniques are developed for single-objective scenarios. Multi-verse optimizer (MVO) is considered as one of the well-regarded optimization approaches in recent years. In this paper, the binary multi-objective variant of MVO (MOMVO) is proposed to deal with feature selection tasks. The standard MOMVO suffers from local optima stagnation, so we propose an improved binary MOMVO to deal with this issue using the memory concept and personal best of the universes. The experimental results and comparisons indicate that the proposed binary MOMVO approach can effectively eliminate irrelevant and/or redundant features and maintain a minimum classification error rate when dealing with different datasets compared with the most popular feature selection techniques. Furthermore, the 14 benchmark datasets showed that the proposed approach outperforms the stat-of-art multi-objective optimization algorithms for feature selection.

Book ChapterDOI
01 Jan 2021
TL;DR: In this article, a hybrid approach for detecting ransomware is proposed, which combines evolutionary clustering approach using Grey Wolf Optimizer (GWO) with an ensemble of Support Vector Machine (SVM) classification.
Abstract: Ransomware analysis and detection has been recently applied using supervised and unsupervised machine learning approaches. Combining both approaches simplifies the complexity of the data and builds an expert classifier for each predicted cluster. In this paper, a hybrid approach for detecting ransomware is proposed. It combines evolutionary clustering approach using Grey Wolf Optimizer (GWO) with an ensemble of Support Vector Machine (SVM) classification on a dataset collected from Android benign and ransomware applications. Experiments are applied to identify the best number of clusters for the proposed approach on the selected dataset and to compare the results with those obtained from applying the SVM single classification approach. Results show that applying SVM-\(GWO_{k=2}\) approach outperforms the corresponding SVM classifier in terms of accuracy.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a deep learning approach for question classification, since deep learning methods have the powerful capability to extract implicit, hidden relationships and automatically generate dense representations of features.
Abstract: Automated question classification is a fundamental component of automated question-answering systems, which plays a critical role in promoting medical and healthcare services. Developing an automated question classification system depends heavily on natural language processing and data mining techniques. Question classification methods based on classical machine learning techniques face limitations in capturing the hidden relationships of features, as well as, handling complex languages and very large-scale datasets. Therefore, this paper proposes a deep learning approach for question classification, since deep learning methods have the powerful capability to extract implicit, hidden relationships and automatically generate dense representations of features. The proposed question classification model depends on unidirectional and bidirectional long short-term memory networks (LSTM and BiLSTM), which essentially developed to handle the Arabic language in the field of healthcare. The features are represented and created using a domain-specific word embedding model (Word2Vec) that is constructed by training around 1.5 million medical consultations from Altibbi company. Altibbi is a telemedicine company that is used as a case study and a source for curating and collecting the data. The proposed deep learning approach is a multi-class classification algorithm that automatically labels and maps the questions into 15 categories of medical specialities. The proposed deep learning model is evaluated using several evaluation metrics, including accuracy, precision, recall, and F1-score. Markedly, the proposed model achieved a superb classification capacity in terms of classification accuracy rate, which gained 87.2%.

Proceedings ArticleDOI
14 Jul 2021
TL;DR: In this article, the authors proposed a deep learning approach to detect different types of intrusion activities using a multistage mechanism and an oversampling process which solves the problem of the imbalanced data produced by the IoT devices.
Abstract: Intrusion Detection Systems for IoT networks have emerged to solve the vulnerabilities caused by the extensive utilization of IoT devices for different applications. Intrusion Detection Systems are not only limited to predicting the existence of intrusion activities apart from the normal ones but it is also extended to identify different types of intrusion activities that allow for a larger scale of recovery actions toward solving this security breach. This study proposes a deep learning approach to detect different types of intrusion activities using a multistage mechanism and an oversampling process which solves the problem of the imbalanced data produced by the IoT devices. This work confirms that the selected classification techniques are not able to detect all types of intrusion for imbalanced data if not combined with the oversampling process. It also compares the proposed approach with other classification and deep learning techniques which consider the oversampling process as part of their pre-processing phase. The results presented in this work show that the proposed approach outperforms the other techniques in terms of Accuracy and G-mean and has an advantage over the other techniques in predicting the different types of intrusion in terms of Precision and Recall.

Journal ArticleDOI
TL;DR: In this article, three types of machine learning (ML) models including support vector machine (SVM), random forest (RF), and gradient boosted decision tree (GBDT) were developed for total dissolved salt (TDS) prediction over several locations in Iraq region.
Abstract: Quantification of the soil physicochemical properties is one of the essential process in the field of soil geo-science. In the current research, three types of machine learning (ML) models including support vector machine (SVM), random forest (RF), and gradient boosted decision tree (GBDT) were developed for Total Dissolved Salt (TDS) prediction over several locations in Iraq region. Various physicochemical soil properties were used as predictors for the TDS prediction. Four modeling scenarios are constructed based on the types of the associated soil input variables properties. The applied ML models were analyzed and discussed based on several statistical measures and graphical presentations. Based on the correlation analysis; Gypsum concentration, Sulfur trioxide ( $SO_{3}$ ), Chloride (Cl), and organic matter (OR) were the essential soil properties for the TDS concentration influence. The prediction results indicated that incorporating all the types of input variables including chemical, soil consistency limits, and soil sieve analysis attained the best prediction process. In quantitative terms, the SVM model attained the maximum coefficient of determination ( $R^{2}=0.849$ ) and minimum root mean square error (RMSE=3.882). Overall, the development of the ML models for the TDS of soil prediction provided a robust and reliable methodology that contributes to the soil geoscience field.

Book ChapterDOI
01 Jan 2021
TL;DR: In this article, the role of evolutionary and swarm intelligence algorithms for data clustering, which showed extreme advantages over the classical clustering algorithms, is discussed and the main idea is to present thoroughly the clustering validation indices that are found in literature, indicating when they were utilized with evolutionary clustering and when used as an objective function.
Abstract: Data clustering is among the commonly investigated types of unsupervised learning; owing to its ability for capturing the underlying information. Accordingly, data clustering has an increasing interest in various applications involving health, humanities, and industry. Assessing the goodness of clustering has been widely debated across the history of clustering analysis, which led to the emergence of abundant clustering evaluation measures. The aim of clustering evaluation is to quantify the quality of the potential clusters which is often referred to as clustering validation. There are two broad categories of clustering validations; the external and the internal measures. Mainly, they differ by relying on external true-labels of the data or not. This chapter considers the role of evolutionary and swarm intelligence algorithms for data clustering, which showed extreme advantages over the classical clustering algorithms. The main idea of this chapter is to present thoroughly the clustering validation indices that are found in literature, indicating when they were utilized with evolutionary clustering and when used as an objective function.

Journal ArticleDOI
TL;DR: An approach for training Random Weight Networks is proposed based on a recent variant of particle swarm optimization called competitive swarm optimization, which will automatically tune the weights, biases, the number of hidden neurons, and regularization factor as well as the embedded activation function in the network, simultaneously.
Abstract: Random Weight Networks have been extensively used in many applications in the last decade because it has many strong features such as fast learning and good generalization performance. Most of the traditional training techniques for Random Weight Networks randomly select the connection weights and hidden biases and thus suffer from local optima stagnation and degraded convergence. The literature shows that stochastic population-based optimization techniques are well regarded and reliable alternative for Random Weight Networks optimization because of high local optima avoidance and flexibility. In addition, many practitioners and non-expert users find it difficult to set the other parameters of the network like the number of hidden neurons, the activation function, and the regularization factor. In this paper, an approach for training Random Weight Networks is proposed based on a recent variant of particle swarm optimization called competitive swarm optimization. Unlike most of Random Weight Networks training techniques, which are used to optimize only the input weights and hidden biases, the proposed approach will automatically tune the weights, biases, the number of hidden neurons, and regularization factor as well as the embedded activation function in the network, simultaneously. The goal is to help users to effectively identify a proper structure and hyperparameter values to their applications while obtaining reasonable prediction results. Twenty benchmark classification datasets are used to compare the proposed approach with different types of basic and hybrid Random Weight Network-based models. The experimental results on the benchmark datasets show that the reasonable classification results can be obtained by automatically tuning the hyperparameters using the proposed approach.

Journal ArticleDOI
TL;DR: The experimental results showed that reducing the number of features and minimising the complexity of RWN networks causes the high performance of the proposed method which exceeded all other methods in terms of accuracy on most datasets.
Abstract: Learning algorithms are mainly used to optimize a performance criterion. Random weight network (RWN) is one of learning algorithms with strong performance that used in wide range of applications. However, the performance of RWN is highly affected by the number of data inputs that is why finding the best input features becomes a necessity when dealing with high dimensional data. In literature, many methods have attempted to determine the optimal subset of features and structure of the RWN separately. In this paper, we propose a cooperative coevolution method based on Particle Swarm Optimisation. The goal of the proposed method is to optimise the structure of the RWN network and simultaneously to find the best subset of features. Three experiments are conducted on thirty medical classification datasets to assess the accuracy of the proposed method. The experimental results showed that reducing the number of features and minimising the complexity of RWN networks causes the high performance of the proposed method which exceeded all other methods in terms of accuracy on most datasets.

Journal ArticleDOI
10 May 2021-Sensors
TL;DR: In this paper, a deep learning-based classification model was proposed for the quality assessment of patient-doctor voice-based conversations in a telehealth service using audio recordings obtained from Altibbi.
Abstract: Maintaining a high quality of conversation between doctors and patients is essential in telehealth services, where efficient and competent communication is important to promote patient health. Assessing the quality of medical conversations is often handled based on a human auditory-perceptual evaluation. Typically, trained experts are needed for such tasks, as they follow systematic evaluation criteria. However, the daily rapid increase of consultations makes the evaluation process inefficient and impractical. This paper investigates the automation of the quality assessment process of patient–doctor voice-based conversations in a telehealth service using a deep-learning-based classification model. For this, the data consist of audio recordings obtained from Altibbi. Altibbi is a digital health platform that provides telemedicine and telehealth services in the Middle East and North Africa (MENA). The objective is to assist Altibbi’s operations team in the evaluation of the provided consultations in an automated manner. The proposed model is developed using three sets of features: features extracted from the signal level, the transcript level, and the signal and transcript levels. At the signal level, various statistical and spectral information is calculated to characterize the spectral envelope of the speech recordings. At the transcript level, a pre-trained embedding model is utilized to encompass the semantic and contextual features of the textual information. Additionally, the hybrid of the signal and transcript levels is explored and analyzed. The designed classification model relies on stacked layers of deep neural networks and convolutional neural networks. Evaluation results show that the model achieved a higher level of precision when compared with the manual evaluation approach followed by Altibbi’s operations team.

Book ChapterDOI
01 Jan 2021
TL;DR: In this paper, the authors present an introduction to clustering and evolutionary data clustering, and review thoroughly the applications of evolutionary Data Clustering and its implementation approaches, and anticipates the use of evolutionary algorithms for addressing the problem of clustering optimization.
Abstract: Clustering is concerned with splitting a dataset into groups (clusters) that represent the natural homogeneous characteristics of the data. Remarkably, clustering has a crucial role in numerous types of applications. Essentially, the applications include social sciences, biological and medical applications, information retrieval and web search algorithms, pattern recognition, image processing, machine learning, and data mining. Even that clustering is ubiquitous over a variety of areas. However, clustering approaches suffer from several drawbacks. Mainly, they are highly susceptible to clusters’ initial centroids which allows a particular dataset to easily fall within a local optimum. Handling clustering as an optimization problem is deemed an NP-hard optimization problem. However, metaheuristic algorithms are a dominant class of algorithms for solving tough and NP-hard optimization problems. This chapter anticipates the use of evolutionary algorithms for addressing the problem of clustering optimization. Therefore, it presents an introduction to clustering and evolutionary data clustering, reviews thoroughly the applications of evolutionary data clustering and its implementation approaches.

Journal ArticleDOI
TL;DR: An evolutionary model based on binary particle swarm optimization (BPSO) combined with random weight networks (RWNs) as an induction algorithm to reduce the high dimensionality of features in the Arabic web pages and to perform document classification automatically is proposed.
Abstract: Nowadays, a huge number of web documents are available on the Internet, which makes the retrieval process of a specific topic very difficult, where some irrelevant pages may be retrieved as well The automatic classification of web documents and pages has an essential application in different domains such as medicine, health, science, and information technology A large number of web pages classification methods have been proposed to improve the search capabilities, especially in English language In addition, the current classification methods attempt to classify the English web pages, and at the same time to reduce the high dimensionality of features extracted from these web pages Due to the lack of classification methods for other languages, this paper focuses on Arabic web pages classification according to its scarcity as well as the importance of the Arabic language In particular, we propose an evolutionary model based on binary particle swarm optimization (BPSO) combined with random weight networks (RWNs) as an induction algorithm to reduce the high dimensionality of features in the Arabic web pages and to perform document classification automatically The datasets used in this paper were collected from popular Arabic websites We collected three different datasets relating to three different fields, namely Computer Science, Science, and Health Further, Taguchi method is incorporated to locate the best parameters of the proposed algorithm The experimental results showed that the proposed model gives better performance results for Arabic web pages classification In addition, an analysis study was conducted to identify the most important features learned from the proposed model as well as the most important tags The results showed that list tag has obtained the highest percentage, which reflect its effectiveness on the classification of Arabic web pages

Book ChapterDOI
01 Jan 2021
TL;DR: In this article, the authors used Grey Wolf Optimizer (GWO) on seven medical data sets to optimize the initial clustering centroids represented by the individuals of each population at each iteration.
Abstract: Evolutionary and swarm intelligence algorithms are used as optimization algorithms for solving the clustering problem. One of the most popular optimization algorithms is the Grey Wolf Optimizer (GWO). In this chapter, we use GWO on seven medical data sets to optimize the initial clustering centroids represented by the individuals of each population at each iteration. The aim is to minimize the distances between instances of the same cluster to predict certain diseases and medical problems. The results show that solving the clustering task using GWO outperforms the other well-regarded evolutionary and swarm intelligence clustering algorithms, by converging toward enhanced solutions having low dispersion from the average values, for all the selected data sets.