scispace - formally typeset
Search or ask a question
Author

Zuhaira Muhammad Zain

Bio: Zuhaira Muhammad Zain is an academic researcher from Princess Nora bint Abdul Rahman University. The author has contributed to research in topics: Machine learning & Artificial intelligence. The author has an hindex of 3, co-authored 6 publications receiving 75 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: This research embeds a particle swarm optimization as feature selection into three renowned classifiers, namely, naive Bayes, K-nearest neighbor, and fast decision tree learner, with the objective of increasing the accuracy level of the prediction model.
Abstract: Women who have recovered from breast cancer (BC) always fear its recurrence. The fact that they have endured the painstaking treatment makes recurrence their greatest fear. However, with current advancements in technology, early recurrence prediction can help patients receive treatment earlier. The availability of extensive data and advanced methods make accurate and fast prediction possible. This research aims to compare the accuracy of a few existing data mining algorithms in predicting BC recurrence. It embeds a particle swarm optimization as feature selection into three renowned classifiers, namely, naive Bayes, K-nearest neighbor, and fast decision tree learner, with the objective of increasing the accuracy level of the prediction model.

130 citations

Journal ArticleDOI
TL;DR: Combining both models in the proposed CNN-LSTM encoder-decoder structure provides a significant boost in forecasting performance, and it is demonstrated that the suggested model produced satisfactory predicting results even with a small amount of data.
Abstract: COVID-19 has sparked a worldwide pandemic, with the number of infected cases and deaths rising on a regular basis. Along with recent advances in soft computing technology, researchers are now actively developing and enhancing different mathematical and machine-learning algorithms to forecast the future trend of this pandemic. Thus, if we can accurately forecast the trend of cases globally, the spread of the pandemic can be controlled. In this study, a hybrid CNN-LSTM model was developed on a time-series dataset to forecast the number of confirmed cases of COVID-19. The proposed model was evaluated and compared with 17 baseline models on test and forecast data. The primary finding of this research is that the proposed CNN-LSTM model outperformed them all, with the lowest average MAPE, RMSE, and RRMSE values on both test and forecast data. Conclusively, our experimental results show that, while standalone CNN and LSTM models provide acceptable and efficient forecasting performance for the confirmed COVID-19 cases time series, combining both models in the proposed CNN-LSTM encoder-decoder structure provides a significant boost in forecasting performance. Furthermore, we demonstrated that the suggested model produced satisfactory predicting results even with a small amount of data.

27 citations

Journal ArticleDOI
TL;DR: Three established data mining algorithms: Naive Bayes (NB), k-nearest neighbor (KNN), and fast decision tree (REPTree), adopting the feature extraction algorithm, principal component analysis (PCA), for predicting breast cancer recurrence were contrasted.
Abstract: Breast cancer recurrence is among the most noteworthy fears faced by women. Nevertheless, with modern innovations in data mining technology, early recurrence prediction can help relieve these fears. Although medical information is typically complicated, and simplifying searches to the most relevant input is challenging, new sophisticated data mining techniques promise accurate predictions from high-dimensional data. In this study, the performances of three established data mining algorithms: Naive Bayes (NB), k-nearest neighbor (KNN), and fast decision tree (REPTree), adopting the feature extraction algorithm, principal component analysis (PCA), for predicting breast cancer recurrence were contrasted. The comparison was conducted between models built in the absence and presence of PCA. The results showed that KNN produced better prediction without PCA (F-measure = 72.1%), whereas the other two techniques: NB and REPTree, improved when used with PCA (F-measure = 76.1% and 72.8%, respectively). This study can benefit the healthcare industry in assisting physicians in predicting breast cancer recurrence precisely.

10 citations

Journal ArticleDOI
TL;DR: The accuracy rate of the proposed malware classification (MC) method was extremely high, making it the most efficient option available, and the method’s accuracy rate was outperformed both the Hand-crafted feature and Deep Feature techniques.
Abstract: Malware development has significantly increased recently, posing a serious security risk to both consumers and businesses. Malware developers continually find new ways to circumvent security research’s ongoing efforts to guard against malware attacks. Malware Classification (MC) entails labeling a class of malware to a specific sample, while malware detection merely entails finding malware without identifying which kind of malware it is. There are two main reasons why the most popular MC techniques have a low classification rate. First, Finding and developing accurate features requires highly specialized domain expertise. Second, a data imbalance that makes it challenging to classify and correctly identify malware. Furthermore, the proposed malware classification (MC) method consists of the following five steps: (i) Dataset preparation: 2D malware images are created from the malware binary files; (ii) Visualized Malware Pre-processing: the visual malware images need to be scaled to fit the CNN model’s input size; (iii) Feature extraction: both hand-engineering (Tamura) and deep learning (GoogLeNet) techniques are used to extract the features in this step; (iv) Classification: to perform malware classification, we employed k-Nearest Neighbor (KNN), Support Vector Machines (SVM), and Extreme Learning Machine (ELM). The proposed method is tested on a standard Malimg unbalanced dataset. The accuracy rate of the proposed method was extremely high, making it the most efficient option available. The proposed method’s accuracy rate was outperformed both the Hand-crafted feature and Deep Feature techniques, at 95.42 and 96.84 percent.

5 citations

Journal ArticleDOI
TL;DR: A Rasch analysis to rank the importance of blog quality criteria for the Personal Diary, Socio-political Commentary, Humor/Entertainment, Lifestyle, and Technology blog categories found that some quality criteria and/or families are more important for some categories but less important for others.
Abstract: A blog quality model has been proposed for bloggers to promote readers satisfaction. However, the model does not determine whether readers consider some quality criteria important, particularly with respect to different blog categories. In this paper, we employed a Rasch analysis to rank the importance of blog quality criteria for the Personal Diary, Socio-political Commentary, Humor/Entertainment, Lifestyle, and Technology blog categories. The authors identified the most important quality criteria and families for each category. The authors discovered that 1 the importance of quality criteria and/or families depends on the particular category with which a reader engages; 2 some quality criteria and/or families are more important for some categories but less important for others; and 3 certain quality criteria and/or families are equally important for some categories but not for others. The authors provide empirical evidence of the most important criteria bloggers and evaluators can focus on when they examine different blog categories. A blog quality model has been proposed for bloggers to promote readers satisfaction. However, the model does not determine whether readers consider some quality criteria important, particularly with respect to different blog categories. In this paper, a Rasch analysis to rank the importance of blog quality criteria for the Personal Diary, Socio-political Commentary, Humor/Entertainment, Lifestyle, and Technology blog categories. The authors identified the most important quality criteria and families for each category. The authors discovered that 1 the importance of quality criteria and/or families depends on the particular category with which a reader engages; 2 some quality criteria and/or families are more important for some categories but less important for others; and 3 certain quality criteria and/or families are equally important for some categories but not for others. The authors provide empirical evidence of the most important criteria bloggers and evaluators can focus on when they examine different blog categories.

2 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A comprehensive survey on the state-of-the-art works applying swarm intelligence to achieve feature selection in classification, with a focus on the representation and search mechanisms.
Abstract: One of the major problems in Big Data is a large number of features or dimensions, which causes the issue of “the curse of dimensionality” when applying machine learning, especially classification algorithms. Feature selection is an important technique which selects small and informative feature subsets to improve the learning performance. Feature selection is not an easy task due to its large and complex search space. Recently, swarm intelligence techniques have gained much attention from the feature selection community because of their simplicity and potential global search ability. However, there has been no comprehensive surveys on swarm intelligence for feature selection in classification which is the most widely investigated area in feature selection. Only a few short surveys is this area are still lack of in-depth discussions on the state-of-the-art methods, and the strengths and limitations of existing methods, particularly in terms of the representation and search mechanisms, which are two key components in adapting swarm intelligence to address feature selection problems. This paper presents a comprehensive survey on the state-of-the-art works applying swarm intelligence to achieve feature selection in classification, with a focus on the representation and search mechanisms. The expectation is to present an overview of different kinds of state-of-the-art approaches together with their advantages and disadvantages, encourage researchers to investigate more advanced methods, provide practitioners guidances for choosing the appropriate methods to be used in real-world scenarios, and discuss potential limitations and issues for future research.

202 citations

Journal ArticleDOI
01 Sep 2020
TL;DR: Five supervised machine learning techniques named support vector machine (SVM), K-nearest neighbors, random forests, artificial neural networks (ANNs) and logistic regression are compared and it is revealed that the ANNs obtained the highest accuracy, precision, and F1 score.
Abstract: Early detection of disease has become a crucial problem due to rapid population growth in medical research in recent times. With the rapid population growth, the risk of death incurred by breast cancer is rising exponentially. Breast cancer is the second most severe cancer among all of the cancers already unveiled. An automatic disease detection system aids medical staffs in disease diagnosis and offers reliable, effective, and rapid response as well as decreases the risk of death. In this paper, we compare five supervised machine learning techniques named support vector machine (SVM), K-nearest neighbors, random forests, artificial neural networks (ANNs) and logistic regression. The Wisconsin Breast Cancer dataset is obtained from a prominent machine learning database named UCI machine learning database. The performance of the study is measured with respect to accuracy, sensitivity, specificity, precision, negative predictive value, false-negative rate, false-positive rate, F1 score, and Matthews Correlation Coefficient. Additionally, these techniques were appraised on precision–recall area under curve and receiver operating characteristic curve. The results reveal that the ANNs obtained the highest accuracy, precision, and F1 score of 98.57%, 97.82%, and 0.9890, respectively, whereas 97.14%, 95.65%, and 0.9777 accuracy, precision, and F1 score are obtained by SVM, respectively.

138 citations

Journal ArticleDOI
TL;DR: In this paper, a hybrid CNN-LSTM model was proposed to predict lake water level in Lake Michigan and Lake Ontario by coupling boundary corrected (BC) Maximal Overlap Discrete Wavelet Transform (MODWT) data preprocessing with a hybrid Convolutional Neural Network (CNN) Long Short Term Memory (LSTMs) deep learning (DL) model.

75 citations