scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Computer Science in 2021"


Journal ArticleDOI
TL;DR: One-hot encoding was used with DL and ML techniques to classify emails as phishing or nonphishing, demonstrating the effectiveness of semantic analysis in phishing email detection.
Abstract: Corresponding Author: Sikha Bagui Department of Computer Science, The University of West Florida, United States Email: bagui@uwf.edu Abstract: Representation of text is a significant task in Natural Language Processing (NLP) and in recent years Deep Learning (DL) and Machine Learning (ML) have been widely used in various NLP tasks like topic classification, sentiment analysis and language translation. Until very recently, little work has been devoted to semantic analysis in phishing detection or phishing email detection. The novelty of this study is in using deep semantic analysis to capture inherent characteristics of the text body. One-hot encoding was used with DL and ML techniques to classify emails as phishing or nonphishing. A comparison of various parameters and hyperparameters was performed for DL. The results of various ML models, Naïve Bayes, SVM, Decision Tree, as well as DL models, Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM), were presented. The DL models performed better than the ML models in terms of accuracy, but the ML models performed better than the DL models in terms of computation time. CNN with Word Embedding performed the best in terms of accuracy (96.34%), demonstrating the effectiveness of semantic analysis in phishing email detection.

15 citations


Journal ArticleDOI
TL;DR: This study proposes a model to detect and discover emotions/opinions of YouTube users on herbal treatment videos through an analysis of user comments by using machine learning classifiers and introduces a new Arabic Dataset on Herbal Treatments for Diabetes (ADHTD).
Abstract: Social media platforms are extensively used in exchanging and sharing information and user experience, thereby resulting in massive outspread and viewing of personal experiences in many fields of life. Thus, informative health-related videos on YouTube are highly perceptible. Many users tend to procure medical treatments and health-related information from social media particularly from YouTube when searching for chronic illness treatments. Sometimes, these sources contain misinformation that cause fatal effects on the users’ health. Many sentimental analyses and classifications have been conducted on social media platforms to study user post and comments on many life science fields. However, no study has been conducted on the analysis of Arabic user comments, which provide details on herbal treatments for people with diabetes. Therefore, this study proposes a model to detect and discover emotions/opinions of YouTube users on herbal treatment videos is proposed through an analysis of user comments by using machine learning classifiers. In addition, a new Arabic Dataset on Herbal Treatments for Diabetes (ADHTD), which is based on user comments from several YouTube videos, is introduced. This study examines the impact of four representation methods on ADHTD to show the performance of machine learning classifiers. These methods remove repeating characters in Arabic dialect and character extension known as ‘TATAWEEL’ or ‘MAD’, stemming of Arabic words, Arabic stop words removal and N-grams with Arabic words. Experiments has been conducted based aforementioned methods to handle imbalanced proposed dataset and identify the best machine learning classifiers over Arabic dialect textual data. The model has achieved a higher accuracy that reached 95% when using Synthetic Minority Oversampling TEchnique (SMTOE) techniques to balanced dataset than imbalanced dataset.

13 citations


Journal ArticleDOI
TL;DR: This study compares the effectiveness of stock price and return as input features in directional forecasting models using 10-year historical data of ten large cap US companies and concludes that price is generally a more potent input feature than return value in predicting the direction of price movement.
Abstract: Forecasting directional movement of stock price using machine learning tools has attracted a considerable amount of research. Two of the most common input features in a directional forecasting model are stock price and return. The choice between the former and the latter variables is often subjective. In this study, we compare the effectiveness of stock price and return as input features in directional forecasting models. We perform an extensive comparison of the two input features using 10-year historical data of ten large cap US companies. We employ four popular classification algorithms as the basis of the forecasting models used in our study. The results show that stock price is a more effective standalone input feature than return. The effectiveness of stock price and return equalize when we add technical indicators to the input feature set. We conclude that price is generally a more potent input feature than return value in predicting the direction of price movement. Our results should aid researchers and practitioners interested in applying machine learning models to stock price forecasting.

13 citations


Journal ArticleDOI
TL;DR: This paper proposes a stock prediction model using Generative Adversarial Network (GAN) with Gated Recurrent Units (GRU) used as a generator that inputs historical stock price and generates future stock prices and Convolutional Neural Network (CNN) as a discriminator to discriminate between the real stockprice and generated stock price.
Abstract: Deep learning is an exciting topic. It has been utilized in many areas owing to its strong potential. For example, it has been widely used in the financial area which is vital to the society, such as high-frequency trading, portfolio optimization, fraud detection and risk management. Stock market prediction is one of the most popular and valuable areas in finance. In this paper, it proposes a stock prediction model using Generative Adversarial Network (GAN) with Gated Recurrent Units (GRU) used as a generator that inputs historical stock price and generates future stock price and Convolutional Neural Network (CNN) as a discriminator to discriminate between the real stock price and generated stock price. Different from the traditional methods, which limited the forecasting on one-step-ahead only, by contrast, using the deep learning algorithm is possible to conduct the multi-step ahead prediction more accurately. In this study, it chose the Apple Inc. stock closing price as the target price, with features such as S&P 500 index, NASDAQ Composite index, U.S. Dollar index, etc. In addition, FinBert has been utilized to generate a news sentiment index for Apple Inc. as an additional predicting feature. Finally, this paper compares the proposed GAN model results with the baseline model.

13 citations


Journal ArticleDOI
TL;DR: An accurate, fast and reliable strawberry, cherry fruit detection and classification system for the automated strawberry cherry yield estimation and the fine-tuned MobileNet CNN model performs quite well with higher accuracy of classification of fruit at less computation cost is proposed.
Abstract: This paper proposed an accurate, fast and reliable strawberry, cherry fruit detection and classification system for the automated strawberry cherry yield estimation. State-of-the-art deep learning-based fine-tuned MobileNet Convolutional Neural Network is developed to detect and classify strawberry and cherry fruit types in the outdoor field. The proposed CNN model is trained on 4250 strawberry fruit images, 3878 Cherry fruit images and tested on 990 strawberry fruit’s images, 1012 Cherry fruit images. To capture features and classify fruit type, a fine-tuned MobileNet Convolutional Neural Network model is presented in this study. The original MobileNet CNN model has 88 layers, which is computationally intensive and has more parameters. In the fine-tuned MobileNet CNN model, top layers are frozen and few layers are replaced with other layers such as a depthwise layer, pointwise layer, ReLu and Batch normalization layer, global average pooling layer. The fully connected layer is removed. The fine-tuned MobileNet CNN model performs quite well with higher accuracy of classification of fruit at less computation cost. The proposed CNN Model performs classification and labels them as Blueberry, Huckleberry, Mulberry, Rasberry, strawberry, strawberry wedge, Cherry Brown, Cherry Red, Cherry Rainier, Cherry wax Black, Cherry wax Red, Cherry wax Yellow. The proposed model’s average validation accuracy is about 98.60% and the loss rate is about 0.38%. The fruit images are acquired from the cultivation field include fruits that are occluded by foliage, under the shadow and some degree of overlap of strawberry, cherry flowers.

12 citations


Journal ArticleDOI
TL;DR: This research uses machine learning technique where training is provided to the SQL injection detector using a training data and then is evaluated against a testing data and shows that the proposed technique produces high accuracy in recognizing malicious and benign web requests.
Abstract: Lack of secure codes implemented in the web apps will lead to cyber-attack because of vulnerabilities. The statistic shows that highest record on the data theft related cyber-attacks are through the SQL injection technique. Hence, an effective SQL injection detection is needed in any web system to combat this threat. In this research, machine learning technique is used where training is provided to the SQL injection detector using a training data and then is evaluated against a testing data. The research relies on the preparation of the training and testing datasets. Training sets are used by the detector to establish the knowledge base and the test set is used to evaluate the performance of the detector. The result of the detection shows that the proposed technique produces high accuracy in recognizing malicious and benign web requests.

10 citations


Journal ArticleDOI
TL;DR: The results demonstrate the state of usability attributes of PSAU mobile application is acceptable, and should be useful to IT deanships and related policymakers at the university level with empirical evidence about the issues and problems that faced users of mobile applications in higher educational institutions in KSA.
Abstract: This study investigates the extent to which the usability attributes, namely, effectiveness, efficiency; learnability and memorability, satisfaction, errors, and cognitive load of PSAU mobile application exist from students’ point of view who were enrolling at the academic year 2019-2020 in College of Business Administration (CBA) at Prince Sattam Bin Abdulaziz University. The study employs the People at the Center of Mobile Application Development (PAMCAD) usability model to determine the extent to which the usability attributes are available of PSAU mobile application. A survey-based methodology is used to collect data from a random sample size of 137 enrolled students in the College of Business Administration (CBA) at Prince Sattam bin Abdulaziz University. The results demonstrate the state of usability attributes of PSAU mobile application is acceptable; the highest mean was 3.3 for the cognitive load dimension, after that, the learnability and memorability dimensions with mean 3.0. The lowest mean is 2.4 for the Efficiency dimension. The overall mean for usability is 2.8 which reflect the level of usability for the PSAU mobile application. The results of this study should be useful to IT deanships and related policymakers at the university level with empirical evidence about the issues and problems that faced users of mobile applications in higher educational institutions in KSA; and helping in developing high-quality mobile application.

6 citations



Journal ArticleDOI
TL;DR: A Systematic Literature Review (SLR) of software defect prediction using deep learning models focused on identifying the studies that use the semantics of the source code for improving defect prediction.
Abstract: Corresponding Author: Enas Mohamed Fathy Department of Information Systems, Faculty of Computers and Artificial Intelligence, Helwan University, Helwan 11795, Egypt Email: enasm.fathy@gmail.com Abstract: The approaches associated with software defect prediction are used to reduce the time and cost of discovering software defects in source code and to improve the software quality in the organizations. There are two approaches to reveal the software defects in the source code. The first approach is concentrated on the traditional features such as lines of code, code complexity, etc. However, these features fail to extract the semantics of the source code. The second one is concentrated on revealing these semantics. This paper presents a Systematic Literature Review (SLR) of software defect prediction using deep learning models. This SLR is focused on identifying the studies that use the semantics of the source code for improving defect prediction. This SLR aims to analyze the used datasets, models and frameworks. Also, identifying the evaluation metrics to ensure their applicability in software defect prediction. IEEE Xplore, Scopus and Web of Science digital libraries were used to select the suitable primary studies. Forty (40) primary studies were selected that published by 15 December 2020 for analysis based on the quality criteria. The project levels that applied in the studies were: Within-project 52.5%, cross-project 17.5% and both within-project and cross-project 30%. The datasets used were: Promise dataset 68.18% and other datasets 31.82%. The most used deep learning model in the primary studies was: Convolutional Neural Network (CNN) by 35%. The most used evaluation metrics were: F-measure and Area Under the Curve (AUC). Software defect prediction using deep learning models is still a valuable topic and requires much research studies to enhance the performance of the defect prediction.

5 citations


Journal ArticleDOI
TL;DR: Department of Electrical and Information Engineering, Covenant University, covenant University, Canaanland, P.M.B 1023, Ota, Nigeria Covenant Applied Informatics and Communication Africa Center of Excellence, Covenant university, Canaan land, P-M.O. Box 12363 Jacobs, 4026 Durban, South Africa
Abstract: Department of Electrical and Information Engineering, Covenant University, Canaanland, P.M.B 1023, Ota, Nigeria Covenant Applied Informatics and Communication Africa Center of Excellence, Covenant University, Canaanland, P.M.B 1023, Ota, Nigeria HRA, Institute for Systems Science, Durban University of Technology, P.O. Box 1334, Durban 4000, South Africa Companie d'Electricite de Cote d'Ivoire, Abidjan, Cote d'Ivoire Department of Information and Communication Technology, Mangosuthu University of Technology, P.O. Box 12363 Jacobs, 4026 Durban, South Africa

5 citations


Journal ArticleDOI
TL;DR: This study focused on finding the best way for sentiment analysis by using a series of Hajj-related tweets, which is one of the most important rituals performed by Muslims, where the companies responsible for the pilgrimage season seek to complete the season in best way every year.
Abstract: Corresponding Author: Mohammad Ashraf Ottom Department of Information Systems, Faculty of Information Technology and Computer Sciences, Yarmouk University, Irbid, Jordan Email: ottom.ma@yu.edu.jo Abstract: About forty five percent of the world's population use social networks, thinking of using these platforms seemed to find people's opinions and feelings on various topics. Companies that offer their services and products to customers focus on the subject for future improvement. Thus, serious thinking began to analyze the views of people across different social platforms and also to develop the best ways to analyze these views. In this study, we focused on finding the best way for sentiment analysis by using a series of Hajj-related tweets, which is one of the most important rituals performed by Muslims, where the companies responsible for the pilgrimage season seek to complete the season in best way every year. We used the Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Naïve Bayes (NB) as supervised algorithms for machine-learning approach and Text Blob analyzer for lexicon-based approach. Finding shows that, machine learning techniques worked better than the lexicon approach in the classification and analysis of Hajj related tweets. Even the limited availability of Hajj tweets corpus dataset, SVM reaches the best accuracy which was 84%.

Journal ArticleDOI
TL;DR: Two metrics designed to measure the data uniformity of acceptance tests in FitNesse and Gherkin notations are presented to identify projects with lots of random and meaningless data and it is inferred that test data are more irregular inFitNesse features than in Gherskin features.
Abstract: This paper presents two metrics designed to measure the data uniformity of acceptance tests in FitNesse and Gherkin notations. The objective is to measure the data uniformity of acceptance tests in order to identify projects with lots of random and meaningless data. Random data in acceptance tests hinder communication between stakeholders and increase the volume of glue code. The main contribution of this paper is the implementation of the proposed metrics. This paper also evaluates the uniformity of test data from several FitNesse and Gherkin projects found on GitHub, as a means to verify if the metrics are applicable. First, the metrics were applied to 18 FitNesse project repositories and 18 Gherkin project repositories. The measurements taken from these repositories were used to present cases of irregular and uniform test data. Then, we have compared the notations from FitNesse and Gherkin in terms of projects and features. In terms of projects, no significant difference was observed, that is, FitNesse projects have a level of uniformity similar to Gherkin projects. However, in terms of features and test documents, there was a significant difference. The uniformity scores of FitNesse and Gherkin features are 0.16 and 0.26, respectively. These uniformity scores are very low, which means that test data for both notations are very irregular. Thus, we can infer that test data are more irregular in FitNesse features than in Gherkin features. The evaluation also shows that 28 of 36 projects (78%) did not reach the minimum recommended measure, i.e., 0.45 of test data uniformity. In general, we can observe that there are still many challenges in improving the quality of acceptance tests, especially in relation to the uniformity of test data.

Journal ArticleDOI
TL;DR: It is suggested that IT outsourcing vendors focus on these factors, policymakers implement new strategies to establish their presence in global outsourcing industries and researchers incorporate these variables in their future research.
Abstract: Corresponding Author: Sushil Paudel Mewar University, Chittorgarh, Rajasthan, India Email: sushilpaudel@gmail.com Abstract: Outsourcing has gained popularity as several large U.S. companies in the 1980 s began delegating IT work to foreign firms. In outsourcing, one party (customer) asks another party (vendor) to do a particular job. Outsourcing has its own advantages, like reducing the cost and time and increasing performance and satisfaction. This study has focused on the critical success factors in information technology outsourcing for the emerging market and has considered the perspectives of the vendor. A snowball sampling technique was used to generate quantitative data among the respondents inside Kathmandu valley and variables were drawn from the available literature. Respondents included outsourcing vendors, freelancers, consultants and policymakers. Data were properly tested for reliability using Cronbach’s Alpha and results were validated using convergent and discriminant validity. The analysis included Structured Equation Modeling and Estimation was done using maximum likelihood and partial least squares. The study identified 21 critical success factors for the emerging market under seven categories: System quality, communication quality, service quality, system use, satisfaction, individual benefit and organizational benefit which is the main contribution of the paper. It is suggested that IT outsourcing vendors focus on these factors, policymakers implement new strategies to establish their presence in global outsourcing industries and researchers incorporate these variables in their future research.

Journal ArticleDOI
TL;DR: A Word Sense Disambiguation method is proposed which is used in finding the sense-oriented sentence semantic relatedness measure which combines edge-based score between words depending the context of the sentence; sense based score which finds sentences having similar senses; as well as word order score.
Abstract: Corresponding Author: Nazreena Rahman Department of Computer Science and Engineering, Kaziranga University, India Email: nazreena.rehman@gmail.com Abstract: Finding semantic relatedness score between two sentences is useful in many research areas. Existing relatedness methods do not consider its sense while computing semantic relatedness score between two sentences. In this study, a Word Sense Disambiguation (WSD) method is proposed which is used in finding the sense-oriented sentence semantic relatedness measure. The WSD method is used to find the correct sense of a word present in a sentence. The proposed method uses both the WordNet lexical dictionary and the Wikipedia corpus. The sense-oriented sentence semantic relatedness measure combines edge-based score between words depending the context of the sentence; sense based score which finds sentences having similar senses; as well as word order score. We have evaluated the proposed WSD method on publicly available English WSD corpora. We have compared our proposed sense-oriented sentence semantic relatedness measure on standard datasets. Experimental analysis illustrates the significance of proposed method over many baseline and current systems like Lesk, UKB, IMS, Babelfy.

Journal ArticleDOI
TL;DR: A deep learning framework to perform face verification in videos is proposed and an average two percent increase is obtained in the accuracy of the face verification models by applying these methods.
Abstract: Corresponding Author: Vimala Mathew Department of Computer Science, Hindustan Institute of Technology and Science, Chennai, India Email: vimalamathew@gmail.com Abstract: Person re-identification in surveillance camera videos is attracting widespread interest due to its increasing number of applications. It is being applied in the field of security, healthcare, product manufacturing, product sales and more. Though there are a variety of methods to do person reidentification, face verification-based methods are very much effective. In this study, a deep learning framework to perform face verification in videos is proposed. Face verification deep learning model development includes different stages like face recognition, cropping, alignment, augmentation, image enhancement and face image selection for model training. The authors have put forward innovative methods to be adopted in various stages of this sequence to improve the performance of the models. The focus of this study is on these image preprocessing stages of the process, rather than the deep learning part, which makes the approach unique. The overall model is improvised by increasing the efficiency of each of these stages by adopting methods like face recognition and cropping based on face landmarks, effective training image selection using face landmark symmetry, various image augmentation techniques including perspective transformation and image enhancement methods like contrast stretching and histogram equalization. An average two percent increase is obtained in the accuracy of the face verification models by applying these methods.

Journal ArticleDOI
TL;DR: Analysis of one of the fundamental parts of the Advanced Persistent Threats (APT) Attacks, the MICTIC, and the importance of Attribution in the analysis of APTS.
Abstract: Email: pb@pbrandao.net Abstract: Analysis of one of the fundamental parts of the Advanced Persistent Threats (APT) Attacks. The phases of the APTs, their framing with the identification of criminals. Type of attack that normally requires resources only available to the State-State hacking. The importance of Attribution in the analysis of APTS. The unique and differentiating characteristics of this type of attacks compared to traditional cyber-attacks. Development of an extension for one of the few Frameworks applied to Attribution in APTs, the MICTIC.

Journal ArticleDOI
TL;DR: It could be concluded that BiLSTM is the best architecture suited for automatic music transcription.
Abstract: Automatic Music Transcription (AMT) is becoming more and more popular throughout the day, it has piqued the interest of many in addition to academic research. A successful AMT system would be able to bridge multiple ranges of interactions between people and music, including music education. The goal of this research is to transcribe an audio input to music notation. Research methods were conducted by training multiple neural networks architectures in different kinds of cases. The evaluation used two approaches, those were objective evaluation and subjective evaluation. The result of this research was an achievement of 74.80% F1 score and 73.3% out of 30 respondents claimed that Bidirectional Long Short-Term Memory (BiLSTM) has the best result. It could be concluded that BiLSTM is the best architecture suited for automatic music transcription.

Journal ArticleDOI
TL;DR: This study provides an analysis of what AGI protection scholars have written about the essence of human beliefs and proposes several well-supported hypotheses to indicate the difficulty of describing the character of human belief and a few meta-level theories are needed.
Abstract: The defense sphere of Artificial General Intelligence (AGI) is developing exponentially. Notwithstanding, there is an under the definition of the character of human beliefs pertaining to AGI associations. Distinctive AGI protection scholars formulated numerous hypotheses regarding the existence of human beliefs, but contradictions exist. This study provides an analysis of what AGI protection scholars, up to the beginning of 2019, have written about the essence of human beliefs. It is generally advised to use a theory classification system, where the ideas are evaluated following the degree of their sophistication and size of behaviorists-internalists, equally because of the scope of their consensus mankind. We propose several well-supported hypotheses to indicate the difficulty of describing the character of human beliefs and a few meta-level theories are needed.

Journal ArticleDOI
TL;DR: This study has shown that, for effective CBM application in industry, there is a need to develop a systematic methodology for design and selection of adequate data preparation steps and techniques with the proposed ML algorithms.
Abstract: Corresponding Author: Amel Jaoua LR-OASIS, National Engineering School of Tunis, University of Tunis El Manar, Tunis, Tunisia Email: amel.jaoua@polymtl.ca Abstract: Using Machine Learning (ML) prediction to achieve a successful, cost-effective, Condition-Based Maintenance (CBM) strategy has become very attractive in the context of Industry 4.0. In other fields, it is well known that in order to benefit from the prediction capability of ML algorithms, the data preparation phase must be well conducted. Thus, the objective of this paper is to investigate the effect of data preparation on the ML prediction accuracy of Gas Turbines (GTs) performance decay. First a data cleaning technique for robust Linear Regression imputation is proposed based on the Mixed Integer Linear Programming. Then, experiments are conducted to compare the effect of commonly used data cleaning, normalization and reduction techniques on the ML prediction accuracy. Results revealed that the best prediction accuracy of GTs decay, found with the k-Nearest Neighbors ML algorithm, considerately deteriorate when changing the data preparation steps and/or techniques. This study has shown that, for effective CBM application in industry, there is a need to develop a systematic methodology for design and selection of adequate data preparation steps and techniques with the proposed ML algorithms.



Journal ArticleDOI
TL;DR: This article focuses on the group of cancer patients, comprehensively considers the social, medical resources and family issues to analyze the possible impact of the epidemic on cancer patients' drug treatment and health and makes recommendations forcancer patients' management.
Abstract: Corresponding Author: Razi Ahmed Universiti Kuala Lumpur, Malaysian Institute of Information Technology, Kuala Lumpur, Malaysia Email: razi.ahmed@s.unikl.edu.my Abstract: Since December 2019, many unexplained viral pneumonia cases have been found in Wuhan City, Hubei Province, China. It was later confirmed that the outbreak's causative agent was a new coronavirus. The virus was temporarily named \"2019-new coronavirus\" (2019-nCoV) by the World Health Organization (WHO). The diseases caused by 2019-nCoV were called by the National Health and Health Commission of China \"New coronavirus pneumonia\" (Novel Coronavirus Pneumonia, NCP) and was named \"Coronavirus disease 2019\" (COVID-19) by the WHO. The outbreak of NCP seriously affected the lives of the public. This article focuses on the group of cancer patients, comprehensively considers the social, medical resources and family issues to analyze the possible impact of the epidemic on cancer patients' drug treatment and health and makes recommendations for cancer patients' management.

Journal ArticleDOI
TL;DR: The proposed Interval Type-2 fuzzy association rule mining approach is able to eliminate redundant rules which reduce the number of generated rules by 39.5% and memory usage by 22.6%, and discover associative rules with minimum number of symptoms at confidence values as high as 91%.
Abstract: In the literature, several methods explored to analyze breast cancer dataset have failed to sufficiently handle quantitative attribute sharp boundary problem to resolve inter and intra uncertainties in breast cancer dataset analysis. In this study an Interval Type-2 fuzzy association rule mining approach is proposed for pattern discovery in breast cancer dataset. In the first part of this analysis, the interval Type-2 fuzzification of the breast cancer dataset is carried out using Hao and Mendel approach. In the second part, FP-growth algorithm is adopted for associative pattern discovery from the fuzzified dataset from the first part. To define the intuitive words for breast cancer determinant factors and expert data interval, thirty (30) medical experts from specialized hospitals were consulted through questionnaire poling method. To establish the adequacy of the linguistic word defined by the expert, Jaccard similarity measure is used. This analysis is able to discover associative rules with minimum number of symptoms at confidence values as high as 91%. It also identifies High Bare Nuclei and High Uniformity of Cell Shape as strong determinant factors for diagnosing breast cancer. The proposed approach performed better in terms of rules generated when compared with traditional quantitative association rule mining. It is able to eliminate redundant rules which reduce the number of generated rules by 39.5% and memory usage by 22.6%. The discovered rules are viable in building a comprehensive and compact expert driven knowledge�base for breast cancer decision support or expert system

Journal ArticleDOI
TL;DR: The aim of the paper was to show how static model of data mining is used to extract defects and the PSO algorithm and to develop an optimized software flaw prophecy system on data mining techniques namely Association Rule mining, Decision Tree, Naive Bayes and Classification integrated with Particle Swarm Optimization technique.
Abstract: The representation of software must produce flawless significances without any inadequacies. Software imperfection evaluations scheme determines defective mechanisms in software. The eventual creation would have minor or negligible shortcomings to harvest great eminence software. Software quality metrics are a division of software metrics that spotlight the quality aspects of the product. The software flaw prediction system helps in the early discovery of flaws and contributes to talented removal and producing a quality software system through numerous metrics. The aim of the paper was to show how static model of data mining is used to extract defects and the PSO algorithm. Another aim of the research was to develop an optimized software flaw prophecy system on data mining techniques namely Association Rule mining, Decision Tree, Naive Bayes and Classification integrated with Particle Swarm Optimization technique. The proposed software flaw prediction system is deliberated through Data Mining techniques with Particle Swarm Optimization algorithm has been verified and compared the results. This proposed system is very useful to identify the relationships between the quality metrics and the potential defective modules. The optimized data mining systems have pragmatic perfect prediction of these defective modules. In the future, optimized data mining systems can be improved by the use of different platforms and particularly by improving data mining using PSO algorithms. It is necessary to develop algorithms that can identify faults in advance, which will minimize costs and promote the quality of developed software systems. Future optimized data mining systems will improve the relationship between quality metrics and the potential defective modules, which will lead to improved performances, productivity and lower operation costs.

Journal ArticleDOI
TL;DR: This literature review paper has analyzed topic modeling methods from different aspects and identified the research gap between topic modeling in English and Bangla language and identified several types of topic modeling techniques, such as Latent Dirichlet Allocation (LDA), Latent Semantic Analysis (LSA), Support Vector Machine (SVM), Bi-term Topic Modeling (BTM).
Abstract: Due to the enormous growth of information and technology, the digitized texts and data are being immensely generated. Therefore, identifying the main topics in a vast collection of documents by humans is merely impossible. Topic modeling is such a statistical framework that infers the latent and underlying topics from text documents, corpus, or electronic archives through a probabilistic approach. It is a promising field in Natural Language Processing (NLP). Though many researchers have researched this field, only a few significant research has been done for Bangla. In this literature review paper, we have followed a systematic approach for reviewing topic modeling studies published from 2003 to 2020. We have analyzed topic modeling methods from different aspects and identified the research gap between topic modeling in English and Bangla language. After analyzing these papers, we have identified several types of topic modeling techniques, such as Latent Dirichlet Allocation (LDA), Latent Semantic Analysis (LSA), Support Vector Machine (SVM), Bi-term Topic Modeling (BTM). Furthermore, this review paper also highlights the real-world applications of topic modeling. Several evaluation methods were used to evaluate these models’ performances, which we have discussed in this study. We conclude by mentioning the huge future research scopes for topic modeling in Bangla.

Journal ArticleDOI
TL;DR: The result shows the ECCHC produces poor performances on security analysis on grayscale and RGB images, which then concludes it is not suitable to encrypt grayingcale andRGB images.
Abstract: Corresponding Author: Zurina Mohd Hanapi Department of Communication Technology and Network, Faculty of Computer Science and Information Technology, Universiti of Putra Malaysia, Serdang, Selangor, Malaysia Email: zurinamh@upm.edu.my Abstract: The advancement of communication technology helps individual to share images through an Internet. However, the sharing through insecure channels may expose the images to certain attacks that will compromise their confidentiality. Image encryption is one of the methods used to protect against the confidentiality threat. A Hill Cipher has been applied in image encryption because of its simple operation and fast computation, but it also possesses a weak security level which requires the sender and receiver to use and share the same private key within an unsecure channel. Thus, there are many solutions been proposed in utilizing hybrid approach of Hill Cipher where one of them is Elliptic Curve Cryptosystem together with Hill Cipher (ECCHC) to utilize the beauty of Hill Cipher while managing its weaknesses. However, the ECCHC only been experimented over four images which leads to inaccuracy of the results. Thus, this study extended the experiments on 209 images from USC-SIPI database in order to investigate the efficiency of ECCHC. The result shows the ECCHC produces poor performances on security analysis on grayscale and RGB images, which then concludes it is not suitable to encrypt grayscale and RGB images.

Journal ArticleDOI
TL;DR: This study presents a two-stage heuristic feature selection method to classify sports articles using Tabu search and Cuckoo search via Levy flight, and shows significant improvements in the overall accuracy.
Abstract: Sentiment analysis is one of the most popular domains for natural language text classification, crucial for improving information extraction. However, massive data availability is one of the biggest problems for opinion mining due to accuracy considerations. Selecting high discriminative features from an opinion mining database is still an ongoing research topic. This study presents a two-stage heuristic feature selection method to classify sports articles using Tabu search and Cuckoo search via Levy flight. Levy flight is used to prevent the solution from being trapped at local optima. Comparative results on a benchmark dataset prove that our method shows significant improvements in the overall accuracy from 82.6% up to 89.5%.

Journal ArticleDOI
TL;DR: This study presents a Sequence-to-Sequence (Seq2Seq) parsing model for the NL to SQL task, powered by the Transformers Architecture exploring the two Language Models (LM): Text-To-Text Transfer Transformer (T5) and the Multilingual pre-trained Text- to-Text Trans transformer (mT5).
Abstract: Corresponding Author: Youssef Mellah NovyLab Research, Novelis, Paris, France And LARSA Laboratory, ENSAO, Mohammed First University, Oujda, Morocco Email: ymellah@novelis.io Abstract: Using Natural Language (NL) to interacting with relational databases allows users from any background to easily query and analyze large amounts of data. This requires a system that understands user questions and automatically converts them into structured query language such as SQL. The best performing Text-to-SQL systems use supervised learning (usually formulated as a classification problem) by approaching this task as a sketch-based slot-filling problem, or by first converting questions into an Intermediate Logical Form (ILF) then convert it to the corresponding SQL query. However, non-supervised modeling that directly converts questions to SQL queries has proven more difficult. In this sense, we propose an approach to directly translate NL questions into SQL statements. In this study, we present a Sequence-to-Sequence (Seq2Seq) parsing model for the NL to SQL task, powered by the Transformers Architecture exploring the two Language Models (LM): Text-To-Text Transfer Transformer (T5) and the Multilingual pre-trained Text-To-Text Transformer (mT5). Besides, we adopt the transformationbased learning algorithm to update the aggregation predictions based on association rules. The resulting model achieves a new state-of-the-art on the WikiSQL DataSet, for the weakly supervised SQL generation.

Journal ArticleDOI
TL;DR: The proposed Improved FP-tree construction algorithm has immensely improved the performance of tree construction time by resourcefully using node links, maintained in header table to manage the same item node list in theFP-tree.
Abstract: Incremental mining of frequent patterns has attracted the attention of researchers in the last two decades. The researchers have explored the frequent pattern mining from incremental database problems by considering that the complete database to be processed can be accommodated in systems’ main memory even after the database gets updated very frequently. The FP-tree-based approaches were able to draw more interest because of their compact representation and requirement of a minimum number of database scans. The researchers have developed a few FP-tree based methods to handle the incremental scenario by adjusting or restructuring the tree prefix paths. Although the approaches have managed to solve the re-computation problem by constructing a complete pattern tree data structure using only one database scan, restructuring the prefix paths for each transaction is a computationally costly task, leading to the high tree construction time. If the FP-tree construction process can be supported with suitable data structures, reconstruction of the FP-tree from scratch may be less time consuming than the restructuring approaches in case of incremental scenario. In this study, we have proposed a tree data structure called Improved Frequent Pattern tree (Improved FP-tree). The proposed Improved FP-tree construction algorithm has immensely improved the performance of tree construction time by resourcefully using node links, maintained in header table to manage the same item node list in the FP-tree. The experimental results emphasize the significance of the proposed Improved FP-tree construction algorithm over a few conventional incremental FP-tree construction algorithms with prefix path restructuring.

Journal ArticleDOI
TL;DR: A CDR-based CA modelling that involves the Cumulonimbus clouds by considering three airplane maneuvers, i.e., Velocity, angle Turn and Altitude level Change (VTAC) and results show that collisions between airplanes and clouds can be avoided with minimum change of the initial airplane velocity, angle and altitude levels.
Abstract: An Air Traffic Controller (ATC) system aims to manage airline traffic to prevent collision of the airplane, called the Collision Avoidance (CA). The study on CA, called Conflict Detection and Resolution (CDR), becomes more critical as the airline traffic has grown each year significantly. Previous studies used optimization algorithms for CDR and did not involve the presence of cumulonimbus clouds. Many such clouds can be found in tropical regions like in Indonesia. Therefore, involving such clouds in the CDR optimization algorithms will be significant in Indonesia. We developed a CDR-based CA modelling that involves the Cumulonimbus (CB) clouds by considering three airplane maneuvers, i.e., Velocity, angle Turn and Altitude level Change (VTAC). Our optimization algorithm is developed based on a Mixed-Integer Programming (MIP) solver due to its efficiency. This proposed algorithm requires two input data, namely the initial airplane and cloud states input and the flight parameter such as velocity, angle and altitude levels. The outputs of our VTAC optimization algorithm are the optimum speed, altitude and angle turn of an airplane that is determined based on the currently calculated variables. Extensive experiments have been conducted to validate the proposed approach and the experiment results show that collisions between airplanes and clouds can be avoided with minimum change of the initial airplane velocity, angle and altitude levels. The VTAC algorithm produced longer distance to avoid collision between airplanes by at least 1 Nautical Mile (NM) compared to the VAC algorithm. The addition of angle in the VTAC algorithm has improved the result significantly.