scispace - formally typeset
Search or ask a question

Showing papers in "International Journal of Computer Applications in 2018"


Journal ArticleDOI
TL;DR: In this paper, the use of TF-IDF stands for (term frequencyinverse document frequency) is discussed in examining the relevance of key-words to documents in corpus in order to verify the findings from executing the algorithm.
Abstract: In this paper, the use of TF-IDF stands for (term frequencyinverse document frequency) is discussed in examining the relevance of key-words to documents in corpus. The study is focused on how the algorithm can be applied on number of documents. First, the working principle and steps which should be followed for implementation of TF-IDF are elaborated. Secondly, in order to verify the findings from executing the algorithm, results are presented, then strengths and weaknesses of TD-IDF algorithm are compared. This paper also talked about how such weaknesses can be tackled. Finally, the work is summarized and the future research directions are discussed. Text Mining Text Analytics

328 citations


Journal ArticleDOI
TL;DR: This paper explains how to use blockchain technology in pharmaceutical supply chain to add traceability, visibility and security to the drugs supply system and proposed system will be used in pharmaceutical industry to track the drugs from its manufacturing until its delivery to patient.
Abstract: The production and distribution of counterfeit drugs is an urgent and increasingly critical worldwide issue, especially in developing countries. The market value of pharmaceutical counterfeiting has reached billions of dollars annually. One of the reasons for drugs counterfeiting is the imperfect supply chain system in pharmaceutical industry. Drugs change ownership from manufacturers to wholesaler, distributer and then pharmacist before it reach the customer. In current supply chain system, information is not shared between systems, manufacturers don’t know what happened to their products, drugs regulatory authority has no visibility of the system, recalls are complicated and costly, and companies cannot follow-up patients. In this paper we explain how to use blockchain technology in pharmaceutical supply chain to add traceability, visibility and security to the drugs supply system. The proposed system will be used in pharmaceutical industry to track the drugs from its manufacturing until its delivery to patient. After the usage of a drug, its effect on patient will be recorded to a database for future statistics. A permissioned blockchain will be used for storing transactions and only trusted parties will be allowed to join the network and push data to blockchain.

117 citations


Journal ArticleDOI
TL;DR: This paper focuses on the application of Firebase with Android and aims at familiarizing its concepts, related terminologies, advantages and limitations, and tries to demonstrate some of the features of firebase by developing an Android app.
Abstract: The web application has become more and more reliant upon large amount of database and unorganized data such as videos, images, audio, text, files and other arbitrary types. It is difficult for Relational Database Management System (RDBMS) to handle the unstructured data. Firebase is a relatively new technology for handling large amount of unstructured data. It is very fast as compared to RDBMS. This paper focuses on the application of Firebase with Android and aims at familiarizing its concepts, related terminologies, advantages and limitations. The paper also tries to demonstrate some of the features of Firebase by developing an Android app.

99 citations


Journal ArticleDOI
TL;DR: This paper presents the review of techniques used to design Chatbots and a few examples of chatbots design to give an understanding on how chatbots works and what are the type of approaches available for chatbots development.
Abstract: Recently, the development of conversational system as a medium of conversation between human and computer have made a great stride. This human and computer communication has covered the way for enormous natural language processing techniques. A chatbot is a computer system that allows human to interact with computer using natural language. The chatbots system are widely used in various field such as in businesses, education, healthcare and many more. The design and development of chatbots involves variety of techniques. Therefore, in this paper, we presents the review of techniques used to design Chatbots. A few examples of chatbots design is also discussed to give an understanding on how chatbots works and what are the type of approaches available for chatbots development. With rapid development in Chatbots technology, it is hoped that it could complement human constraints and optimize the productivity.

44 citations


Journal ArticleDOI
TL;DR: This paper provides an insight of the existing algorithm and it gives an overall summary of theexisting work.
Abstract: Heart is the next major organ comparing to brain which has more priority in Human body. It pumps the blood and supplies to all organs of the whole body. Prediction of occurrences of heart diseases in medical field is significant work. Data analytics is useful for prediction from more information and it helps medical centre to predict of various disease. Huge amount of patient related data is maintained on monthly basis. The stored data can be useful for source of predicting the occurrence of future disease. Some of the data mining and machine learning techniques are used to predict the heart disease, such as Artificial Neural Network (ANN), Decision tree, Fuzzy Logic, K-Nearest Neighbour(KNN), Naïve Bayes and Support Vector Machine (SVM). This paper provides an insight of the existing algorithm and it gives an overall summary of the existing work.

42 citations


Journal ArticleDOI
TL;DR: A review of process, techniques and applications of predictive analytics is presented, which is helpful in identifying the risk and opportunities for every individual customer, employee or manager of an organization.
Abstract: Predictive analytics is a term mainly used in statistical and analytics techniques. This term is drawn from statistics, machine learning, database techniques and optimization techniques. It has roots in classical statistics. It predicts the future by analyzing current and historical data. The future events and behavior of variables can be predicted using the models of predictive analytics. A score is given by mostly predictive analytics models. A higher score indicates the higher likelihood of occurrence of an event and a lower score indicates the lower likelihood of occurrence of the event. Historical and transactional data patterns are exploited by these models to find out the solution for many business and science problems. These models are helpful in identifying the risk and opportunities for every individual customer, employee or manager of an organization. With the increase in attention towards decision support solutions, the predictive analytics models have dominated in this field. In this paper, we will present a review of process, techniques and applications of predictive analytics.

41 citations


Journal ArticleDOI
TL;DR: Aim of the paper is to detect phishing URLs as well as narrow down to best machine learning algorithm by comparing accuracy rate, false positive and false negative rate of each algorithm.
Abstract: Phishing attack is a simplest way to obtain sensitive information from innocent users. Aim of the phishers is to acquire critical information like username, password and bank account details. Cyber security persons are now looking for trustworthy and steady detection techniques for phishing websites detection. This paper deals with machine learning technology for detection of phishing URLs by extracting and analyzing various features of legitimate and phishing URLs. Decision Tree, random forest and Support vector machine algorithms are used to detect phishing websites. Aim of the paper is to detect phishing URLs as well as narrow down to best machine learning algorithm by comparing accuracy rate, false positive and false negative rate of each algorithm.

36 citations



Journal ArticleDOI
TL;DR: A proposed algorithm that is optimized to work with Ghanaian vehicle number plates, written in C++ with the OpenCV library, uses edge detection and Feature Detection techniques combined with mathematical morphology for locating the plate.
Abstract: Automatic Number Plate Recognition (ANPR) is a fairly well explored problem with many successful solutions. However, these solutions are typically tuned towards a particular environment due to the variations in the features of number plates across the world. Algorithms written for number plate recognition are based on these features and so a universal solution would be difficult to realize as the image analysis techniques that are used to build these algorithms cannot themselves boast hundred percent accuracy. The focus of this paper is a proposed algorithm that is optimized to work with Ghanaian vehicle number plates. The algorithm, written in C++ with the OpenCV library, uses edge detection and Feature Detection techniques combined with mathematical morphology for locating the plate. The Tesseract OCR engine was then used to identify the detected characters on the plate.

30 citations


Journal ArticleDOI
TL;DR: Fog computing is the extended version of cloud computing having the same data storage and computation capabilities but is fundamentally distributed in nature by providing services at the edge of the network.
Abstract: The Internet has evolved in ways that we could never have imagined. In the beginning, advancements occurred slowly. Today, innovation and communication are happening at a remarkable rate. Now days, Internet has become the most important aspect of our life. Starting from desktop late 90s when one use to go to the device to resolve the problem to the era of smart devices early 20s when everybody carry the devices in its pocket to the new emerging era of internet of everything where we are going to connect each and every non connected device present on the planet. Even though cloud computing has played an efficient role in the computation and processing of these data, however, challenges, such as the security and privacy issues still cannot be resolved by using cloud computing. To overcome these limitations, the term fog computing has emerged to provide computing resources at the edge of the network. Fog computing is the extended version of cloud computing having the same data storage and computation capabilities but is fundamentally distributed in nature by providing services at the edge of the network. In this paper, I have given the brief description about the Fog computing, elaborate its complicated architecture, highlighted few feasible application and mentioned about the current security and privacy issues with the recommended security measures which we are going to face while deploying internet of things in to live environment. General Terms Security, Encryption, Algorithm, Threats.

30 citations


Journal ArticleDOI
TL;DR: Investigation of three contrast enhancement techniques for image enhancement concluded that Histogram Equalization (HE), is the one best contrast enhancement technique, as it recorded high quality images.
Abstract: Image enhancement is one of the key techniques in processing quality of images in systems. The main purpose of image enhancement is to bring out detail that is hidden in an image or to increase contrast in a low contrast image. This technique provides a multitude of choices for improving the visual quality of images. This is the main reason that image enhancement is used in a huge number of applications with important challenges such as noise reduction, degradations, blurring etc. This paper focuses on three contrast enhancement techniques for image enhancement which are: Histogram Equalization (HE), Adaptive Histogram Equalization (AHE) and Contrast Limited Adaptive Histogram Equalization (CLAHE) which are then compared with the help of the eight (8) quality image measurement metrics which are: i.e. the Mean squared error (MSE), Root Mean squared error (RMSE), Peak signal noise ratio (PSNR), Mean absolute error (MAE), Signal to noise ratio (SNR), Image Quality Index (IQI), Similarity Index (SI) and Pearson Correlation Coefficient (r). The paper concluded that Histogram Equalization (HE), is the one best contrast enhancement technique, as it recorded high

Journal ArticleDOI
TL;DR: A working model of Ontology based chatbot is proposed that handles queries from users for an E-commerce website and helps the user by mapping relationships of the various entities required by the user, thus providing detailed and accurate information there by overcoming the drawbacks of traditional chatbots.
Abstract: A working model of Ontology based chatbot is proposed that handles queries from users for an E-commerce website. It is mainly concerned with providing user the total control over the search result on the website. This chatbot helps the user by mapping relationships of the various entities required by the user, thus providing detailed and accurate information there by overcoming the drawbacks of traditional chatbots. The Ontology template is developed using Protégé which stores the knowledge acquired from the website APIs while the dialog manager is handled using Wit.ai. The integration of the dialog manager and the ontology template is managed through Python. The related response to the query will be formatted and returned to the user on the dialog manager.

Journal ArticleDOI
TL;DR: A classifier approach for prediction of weather condition is introduced and how Naive Bayes and Chi square algorithm can be utilized for classification purpose is shown.
Abstract: Weather forecasting is the use of science and technology to predict the condition of the weather for a given area. It is one of the most difficult issues the world over. This project aims to estimate the weather by utilizing predictive analysis. For this reason, analysis of various data mining procedures is needed before apply. This paper introduces a classifier approach for prediction of weather condition and shows how Naive Bayes and Chi square algorithm can be utilized for classification purpose. This system is a web application with effective graphical User Interface. User will login to the system utilizing his user ID and password. User will enter some information such as current outlook, temperature, humidity and wind condition. This system will take this parameter and predict weather after analyzing the input information with the information in database. Consequently two basic functions to be specific classification (training) and prediction (testing) will be performed. The outcomes demonstrated that these data mining procedures can be sufficient for weather forecasting. General Terms Data Mining, Classification, Prediction.

Journal ArticleDOI
TL;DR: The method of how a partitioned (K-mean) clustering works for text document clustering is explored and one of the basic disadvantage of K-mean is explored, which explain the true value of K.
Abstract: In the field of data mining, the approach of assigning a set of items to one similar class called cluster and the process termed as Clustering. Document clustering is one of the rapidly developing, research area for decades and considered a vital task for text mining due to exceptional expansion of document on cyberspace. It provides the opportunity to organize a large amount of scattered text, in meaningful clusters and laydown the foundation for smooth descriptive browsing and navigation systems. One of the more often useable partitioning algorithm is k-means, which is frequently use for text clustering due to its ability of converging to local optimum even though it is for enormous sparse matrix. Its objective is to make the distance of items or data-points belonging to same class as short as possible. This paper, exploring method of how a partitioned (K-mean) clustering works for text document clustering and particularly to explore one of the basic disadvantage of K-mean, which explain the true value of K. The true K value is understandable mostly while automatically selecting the suited value for k is a tough algorithmic problem. The true K exhibits to us how many cluster should make in our dataset but this K is often ambiguous there is no particular answer for this question while many variants for k-means are presented to estimate its value. Beside these variants, range of different probing techniques proposed by multiple researchers to conclude it. The study of this paper will explain how to apply some of these techniques for finding true value of K in a text dataset.

Journal ArticleDOI
TL;DR: This paper presents various existing techniques used to detect the disease of agriculture product, and surveys the mythologies utilized for disease detection, segmentation of the affected part and classification of the diseases.
Abstract: Quality agriculture production is the essential trait for any nation’s economic growth. So, recognition of the deleterious regions of plants can be considered as the solution for saving the reduction of crops and productivity. The past traditional approach for disease detection and classification requires enormous amount of time, extreme amount of work and continues farm monitoring. In the last few years, advancement in the technology and researchers’ focus in this area makes it possible to obtain optimized solution for it. To identify and detect the disease on agriculture product various popular methods of the fields like machine learning, image processing and classification approaches have been utilized. This paper presents various existing techniques used to detect the disease of agriculture product. Also, paper surveys the mythologies utilized for disease detection, segmentation of the affected part and classification of the diseases. It also includes the summary of various feature extraction techniques, various segmentation techniques and various classifiers along with benefits and drawbacks.

Journal ArticleDOI
TL;DR: An overview of existing recommender approaches used in tourism is presented and their relevance taking into account tourism context and specificities are discussed.
Abstract: Recommender systems have become an active research topic during the last two decades, thus giving rise to several approaches and techniques. They have also become increasingly popular among practitioners and used in variety of areas including movies, news, books, research articles restaurants, garments, financial services, insurance, social tags and products in general. Tourism is an important sector for economic development and a potential application area of use of recommender systems. This paper presents an overview of existing recommender approaches used in tourism and discusses their relevance taking into account tourism context and specificities. General Terms Information systems, information retrieval; retrieval tasks and goals; recommender system

Journal ArticleDOI
TL;DR: Various methods used to enhance the signal of speech, including wiener filter, statistical methods, subspace method, basic spectral subtraction method and spectral subtracted are discussed.
Abstract: Speech enhancement is used in almost all the modern communication systems. It is obvious that when speech is being transmitted, its quality may degrade due to interference in the environment it is passing through. Some of the interferences that may affect the speech quality of transit include acoustic additive noise, acoustic reverberation or white Gaussian noise. This paper focuses on the techniques that appeared in the literature to enhance the signal of speech. Various methods used include wiener filter, statistical methods, subspace method, basic spectral subtraction method and spectral subtraction. In this paper authors will discuss various such methods along with their advantages and disadvantages. The discussion will also review the studies conducted by other researchers on other machine learning techniques, such as Neural network, Deep Neural Network ,Convolution Neural Networks and optimization techniques which used for the enhancement of speech.

Journal ArticleDOI
TL;DR: An automated skin disease diagnosis system which takes images of a skin disease as an input by the user and predicts the type of skin disease, which effectively amalgamates image processing and machine learning is presented.
Abstract: Dermatology is one of the major field of prescription which is worried about the analysis and treatment of skin disorders. Skin diseases are among the most widely recognized medical issues around the world. Regardless of being common, their determination is quite troublesome and requires broad knowledge and expertise in the area. Skin disease might cause severe health and monetary consequences for patients if not detected and controlled early. Early recognition can forestall the condition from worsening. This research paper presents the development of an automated skin disease diagnosis system which takes images of a skin disease as an input by the user and predicts the type of skin disease. The system uses a dual stage approach for detection and prediction process which effectively amalgamates image processing and machine learning. In the 1st stage, the image of the skin condition is subject to numerous types of pre-processing techniques followed by feature extraction. The extracted features for each image are then converted to a feature vector. In the second stage, the feature vectors are fed to a machine learning algorithm (artificial neural networks) to identify disease and predict accordingly. On training and testing for 5 diseases (eczema, psoriasis, impetigo, melanoma, and scleroderma) system produces an overall prediction accuracy of 90%.

Journal ArticleDOI
TL;DR: How Big data has impacted the Machine learning Community, the significance of Machine Learning and how the BlockChain Technology could be used similarly impact the Machine Learning Community are discussed.
Abstract: The importance of big data in machine learning cannot be overemphasized in recent times. Through the evolution of big data, most scientific technologies that relied heavily on enormous data in solving complex issues in human lives gained grounds; machine learning is an instance of these technologies. Various machine learning models that yield groundbreaking throughputs with high efficiency rates in predicting, detecting, classifying, discovering and acquiring in-depth knowledge about events that would otherwise be very difficult to ascertain have been made possible due to big data. Although big data has undoubtedly helped in the field of machine learning research ,over the years, its mode of acquisition has posed great challenge in industries,education and other agencies that obtained them for various purposes. This is because these large quantities of data cannot be stored on personal computers with limited storage capabicity but required the use of high storage capacity servers for effective storage. These servers may be owned by a group of companies or individuals who had the singular priviledge to modify the data in their possession as and when deemed relevant thus the creation of a centralized data storage environment. These were mostly refered to as the Third Parties (TP) in the data acquisition process. For the services they rendered, these trusted parties priced data in their possession expensively. The adverse effect is a limitation on various researches that could help solve a number of problems in human lives. It is worth mentioning that the security of these data being purchased expensively cannot be even assured limiting various researches that thrive on secured data. In order to curb these occurrences and have better machine learning models, the incorporation of Blockchain Technology databases into machine learning. This paper discusses the concept of big data, Machine Learning and Blockchains. It further discusses how Big data has impacted the Machine learning Community, the significance of Machine Learning and how the BlockChain Technology could be used similarly impact the Machine Learning Community. The aim of this paper is to encourge further research in incoporating the BlockChain Technology into Machine Learning.

Journal ArticleDOI
TL;DR: This research paper gives a fair idea of phishing attack, the types ofphishing attack through which the attacks are performed, detection and prevention towards it.
Abstract: Now a day there is a lot of data security issues. Hackers are now very much expert in using their knowledge for hack into someone else’s system and grab the information. Phishing is one such type of methodologies which are used to acquire the information. Phishing is a cyber crime in which emails, telephone, text messages, personally identifiable information, banking details, credit card details, password is been targeted. Phishing is mainly a form of online identify theft. Social Engineering is being used by the phisher to steal victim’s personal data and the account details. This research paper gives a fair idea of phishing attack, the types of phishing attack through which the attacks are performed, detection and prevention towards it.


Journal ArticleDOI
TL;DR: The study has verified that data mining techniques can be used in predicting students’ academic performance in higher educational institutions and recommended the adoption of SVR and LR methods to predict final CGPA8.
Abstract: Predicting students‟ academic performance is very crucial especially for higher educational institutions. This paper designed an application to assist higher education institutions to predict their students‟ academic performance at an early stage before graduation and decrease students‟ dropout. The performance of the students was measured based on cumulative grade point average (CGPA) at semester eight. The students‟ course scores for core and non-core courses from the first semester to the sixth semester are used as predictor variables for predicting the final CGPA8 upon graduation using Neural Network (NN), Support Vector Regression(SVR), and Linear Regression (LR). The study has verified that data mining techniques can be used in predicting students‟ academic performance in higher educational institutions. All the experiments gave valid results and can be used to predict graduation CGPA. However, comparisons of the experiments were done to determine which approaches perform better than others. Generally, SVR and LR methods performed better than NN. Therefore, we recommend the adoption of SVR and LR methods to predict final CGPA8, and the models can also be used to implement Student Performance Prediction System(SPPS) in a university. Thus, the study has used the models from SVR and LR methods for designing an application to do the prediction task.

Journal ArticleDOI
TL;DR: The proposed research work is for analysis of various machine algorithms applying on plant disease prediction, dealing with decision tree, Naive Bayes theorem, artificial neural network and k-mean clustering and random forest algorithms.
Abstract: Machine learning is the one of the branch in Artificial Intelligence to work automatically or give the instructions to a particular system to perform a action. The goal of machine Learning is to understand the structure of the data and fit that data into models that can be understood and utilized by the people. The proposed research work is for analysis of various machine algorithms applying on plant disease prediction. A plant shows some visible effects of disease, as a response to the pathogen. The visible features such as shape, size, dryness, wilting, are very helpful to recognize the plant condition. The research paper deals with all such features and apply various machine learning technologies to find out the output. The research work deals with decision tree, Naive Bayes theorem, artificial neural network and k-mean clustering and random forest algorithms. Disease development depends on three conditions-host plants susceptible to disease, favorable environment and viable pathogen. The presence of all three conditions is must for a disease to occur.

Journal ArticleDOI
TL;DR: This survey paper is an exploration of XMPP, AMQP, and LWM2M protocols, and exemplify the comparison between them, which are extensively used protocols in most M2M communication.
Abstract: The concept of the Internet of Things emerged a long time ago, having enormous development in sensing devices and every-day-objects connected to the internet. With current internet infrastructure, wireless communication plays a vital role in IoT devices allowing them to transmit messages. Therefore, the vitality of these messages lies in authentication. Numerous key management techniques have also been introduced to provide a secured transmission over the internet. In the context of IoT, many protocols have been devised for authenticated and secured transmission, including XMPP, AMQP, and LWM2M. Addition to above, MQTT and CoAP are also extensively used protocols in most M2M communication. This survey paper is an exploration of these protocols and also exemplify the comparison between them.

Journal ArticleDOI
TL;DR: An attempt is made to recognize handwritten Bangla character using Convolutional Neural Network along with the method of inception module without feature extraction to achieve highest success and accuracy rate.
Abstract: With the advancement of modern technology the necessity of pattern recognition has increased a lot. Character recognition it's part of pattern recognition. In last few decades there has been some researches on optical character recognition(OCR) for so many languages like Roman, Japanese, African, Chinese, English and some researches of Indian language like -Tamil, Devanagari, Telugu, Gujratietc and so many other languages. There are very few works on handwritten Bangla character recognition. As it is tough to understand like Bangla language because of different people handwritten varies in fervidity or formation, stripe and angle. For this it's so much challenging to work in this field. In some researches SVM, MLP, ANN, HMM, HLP & CNN has been used for handwritten Bangla character recognition. In this paper an attempt is made to recognize handwritten Bangla character using Convolutional Neural Network along with the method of inception module without feature extraction. The feature extraction occurs during the training phase rather than the dataset preprocessing phase. As CNN can't take input data that varying in shape ,so had to rescaled the dataset images at fixed different size. In total final dataset contains 100000 images of dimension 28x28. 85000 images is used for training and 3000 images is used for testing. After analyzing the results a conclusion is derived on the proposed work and stated the future goals and plans to achieve highest success and accuracy rate.

Journal ArticleDOI
TL;DR: Sentiment analysis in text documents is essentially a content-based classification problem involving concepts from the domains of Natural Language Processing (NLP) and Machine Learning (ML) as discussed by the authors.
Abstract: Social media is increasingly used by humans to express their feelings and opinions in the form of short text messages. Detecting sentiments in the text has a wide range of applications including identifying anxiety or depression of individuals and measuring well-being or mood of a community. Sentiments can be expressed in many ways that can be seen such as facial expression and gestures, speech and by written text. Sentiment Analysis in text documents is essentially a content-based classification problem involving concepts from the domains of Natural Language Processing as well as Machine Learning. In this paper, sentiment recognition based on textual data and the techniques used in sentiment analysis are discussed.

Journal ArticleDOI
TL;DR: Recurrent Neural Network (RNN) is used to predict a student’s final result using first and second term class along with fifteen others features of a student to help the teacher to identify the students, who are ‘at risk’ and based on that he can offer proper remedy to them.
Abstract: Educational Data Mining able to gain a handsome amount of attention of the researcher of educational technology in recent times. In this paper, Recurrent Neural Network (RNN) is used to predict a student’s final result. RNN is a variant of neural network that can handle time series data. The final term class is predicted using the first and second term class along with fifteen others features of a student. This analysis help the teacher to identify the students, who are ‘at risk’ and based on that he can offer proper remedy to them. In this paper, a comparison based study is also made with Artificial Neural Network and Deep Neural Network with the proposed Recurrent Neural Network.

Journal ArticleDOI
TL;DR: A new method for adding parameters to a well-established distribution to obtain more flexible new families of distributions is applied to the inverse Weibull distribution (IWD).
Abstract: A new method for adding parameters to a well-established distribution to obtain more flexible new families of distributions is applied to the inverse Weibull distribution (IWD). This method is known by the Alpha-Power transformation (APT) and introduced by Mahdavi and Kundu [9]. The statistical and reliability properties of the proposed models are studied. The estimation of the model parameters by maximum likelihood and the observed information matrix are also discussed. The extended model is applied on a real data and the results are given and compared to other models.

Journal ArticleDOI
TL;DR: This study compares six different imputation methods to find the one that performs the most appropriate treatment for categorical data, type ordinal, in a breast cancer dataset.
Abstract: The treatment of missing data has become a mandatory step for performing valid data analysis in most scientific research fields. In fact, researchers have found that dealing with missing data avoids misleading data analysis and improves the quality and power of the research results [1]. According to the authors in [2,3], the missing values in a data set could be missing completely at random (MCAR), missing at random (MAR), or missing not at random (MNAR), a categorization that should be taken into consideration to deal with the problem of missing data. The number of observations, the types of variables, and the percentage of missing values in a data set are also important characteristics that should be contemplated before dealing with missing values. Understanding the missing data case helps the researchers to identify the imputation techniques that best handles the missing data problem. However, the development of procedures to impute categorical data is not significantly available as the procedures focused on continuous data imputation [1]. This study compares six different imputation methods to find the one that performs the most appropriate treatment for categorical data, type ordinal, in a breast cancer dataset. General Terms Data imputation; missing data.